Re: [HACKERS] PATCH: Batch/pipelining support for libpq

Andres Freund Wed, 23 Aug 2017 02:41:58 -0700

Hi,

On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote:
> Andres Freund wrote :
> > If you were to send a gigabyte of queries, it'd buffer them all up in
> memory... So some
> >more intelligence is going to be needed.
> 
> Firstly, sorry for the delayed response as I got busy with other
> activities.


No worries - development of new features was slowed down anyway, due to
the v10 feature freeze.


> To buffer up the queries before flushing them to the socket, I think
> "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
> Windows as per comments in pqSendSome() API.
> 
> Attached the code patch to replace pqFlush calls with pqBatchFlush in the
> asynchronous libpq function calls flow.
> Still pqFlush is used in "PQbatchSyncQueue" and
> "PQexitBatchMode" functions.


> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...



> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage 
> to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    can sometimes offer considerable performance improvements.
> +   </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.


> +   <para>
> +    Batching is most useful when the server is distant, i.e. network latency
> +    (<quote>ping time</quote>) is high, and when many small operations are 
> being performed in
> +    rapid sequence. There is usually less benefit in using batches when each
> +    query takes many multiples of the client/server round-trip time to 
> execute.
> +    A 100-statement operation run on a server 300ms round-trip-time away 
> would take
> +    30 seconds in network latency alone without batching; with batching it 
> may spend
> +    as little as 0.3s waiting for results from the server.
> +   </para>

I'd add a remark that this is frequently beneficial even in cases of
minimal latency - as e.g. shown by the numbers I presented upthread.


> +   <para>
> +    Use batches when your application does lots of small
> +    <literal>INSERT</literal>, <literal>UPDATE</literal> and
> +    <literal>DELETE</literal> operations that can't easily be transformed 
> into
> +    operations on sets or into a
> +    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> +   </para>

Aren't SELECTs also a major beneficiarry of this?


> +   <para>
> +    Batching is less useful when information from one operation is required 
> by the
> +    client before it knows enough to send the next operation.

s/less/not/


> +   <note>
> +    <para>
> +     The batch API was introduced in PostgreSQL 10.0, but clients using 
> PostgresSQL 10.0 version of libpq can
> +     use batches on server versions 8.4 and newer. Batching works on any 
> server
> +     that supports the v3 extended query protocol.
> +    </para>
> +   </note>

Where's the 8.4 coming from?


> +   <para>
> +    The client uses libpq's asynchronous query functions to dispatch work,
> +    marking the end of each batch with <function>PQbatchSyncQueue</function>.
> +    And to get results, it uses <function>PQgetResult</function> and
> +    <function>PQbatchProcessQueue</function>. It may eventually exit
> +    batch mode with <function>PQexitBatchMode</function> once all results are
> +    processed.
> +   </para>
> +
> +   <note>
> +    <para>
> +     It is best to use batch mode with <application>libpq</application> in
> +     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If 
> used in
> +     blocking mode it is possible for a client/server deadlock to occur. The
> +     client will block trying to send queries to the server, but the server 
> will
> +     block trying to send results from queries it has already processed to 
> the
> +     client. This only occurs when the client sends enough queries to fill 
> its
> +     output buffer and the server's receive buffer before switching to
> +     processing input from the server, but it's hard to predict exactly when
> +     that'll happen so it's best to always use non-blocking mode.
> +    </para>
> +   </note>

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory?  We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.


> +   <sect3 id="libpq-batch-sending">
> +    <title>Issuing queries</title>
> +
> +    <para>
> +     After entering batch mode the application dispatches requests
> +     using normal asynchronous <application>libpq</application> functions 
> such as 
> +     <function>PQsendQueryParams</function>, 
> <function>PQsendPrepare</function>,
> +     <function>PQsendQueryPrepared</function>, 
> <function>PQsendDescribePortal</function>,
> +     <function>PQsendDescribePrepared</function>.
> +     The asynchronous requests are followed by a <link
> +     
> linkend="libpq-PQbatchSyncQueue"><function>PQbatchSyncQueue(conn)</function></link>
>  call to mark
> +     the end of the batch. The client <emphasis>does not</emphasis> need to 
> call
> +     <function>PQgetResult</function> immediately after dispatching each
> +     operation. <link linkend="libpq-batch-results">Result processing</link>
> +     is handled separately.
> +    </para>
> +    
> +    <para>
> +     Batched operations will be executed by the server in the order the 
> client
> +     sends them. The server will send the results in the order the statements
> +     executed. The server may begin executing the batch before all commands
> +     in the batch are queued and the end of batch command is sent. If any
> +     statement encounters an error the server aborts the current transaction 
> and
> +     skips processing the rest of the batch. Query processing resumes after 
> the
> +     end of the failed batch.
> +    </para>

Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...


> +    <note>
> +     <para>
> +      The client must not assume that work is committed when it
> +      <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> +      corresponding result is received to confirm the commit is complete.
> +      Because errors arrive asynchronously the application needs to be able 
> to
> +      restart from the last <emphasis>received</emphasis> committed change 
> and
> +      resend work done after that point if something goes wrong.
> +     </para>
> +    </note>

This seems fairly independent of batching.


> +   </sect3>
> +
> +   <sect3 id="libpq-batch-interleave">
> +    <title>Interleaving result processing and query dispatch</title>
> +
> +    <para>
> +     To avoid deadlocks on large batches the client should be structured 
> around
> +     a nonblocking I/O loop using a function like 
> <function>select</function>,
> +     <function>poll</function>, <function>epoll</function>,
> +     <function>WaitForMultipleObjectEx</function>, etc.
> +    </para>
> +
> +    <para>
> +     The client application should generally maintain a queue of work still 
> to
> +     be dispatched and a queue of work that has been dispatched but not yet 
> had
> +     its results processed.

Hm. Why? If queries are just issued, no such queue is required?


> When the socket is writable it should dispatch more
> +     work. When the socket is readable it should read results and process 
> them,
> +     matching them up to the next entry in its expected results queue. 
> Batches
> +     should be scoped to logical units of work, usually (but not always) one
> +     transaction per batch. There's no need to exit batch mode and re-enter 
> it
> +     between batches or to wait for one batch to finish before sending the 
> next.
> +    </para>

This really needs to take memory usage into account.

> +       </variablelist>
> +
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry id="libpq-PQenterBatchMode">
> +     <term>
> +      <function>PQenterBatchMode</function>
> +      <indexterm>
> +       <primary>PQenterBatchMode</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to enter batch mode if it is currently idle or
> +      already in batch mode.
> +
> +<synopsis>
> +int PQenterBatchMode(PGconn *conn);
> +</synopsis>
> +
> +        </para>
> +        <para>
> +          Returns 1 for success. Returns 0 and has no 
> +          effect if the connection is not currently idle, i.e. it has a 
> result 
> +          ready, is waiting for more input from the server, etc. This 
> function 
> +          does not actually send anything to the server, it just changes the 
> +          <application>libpq</application> connection state.
> +
> +        </para>
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry id="libpq-PQexitBatchMode">
> +     <term>
> +      <function>PQexitBatchMode</function>
> +      <indexterm>
> +       <primary>PQexitBatchMode</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to exit batch mode if it is currently in batch mode
> +      with an empty queue and no pending results.
> +<synopsis>
> +int PQexitBatchMode(PGconn *conn);
> +</synopsis>
> +        </para>
> +        <para>Returns 1 for success.
> +      Returns 1 and takes no action if not in batch mode. If the connection 
> has

"returns 1"?


> +    <varlistentry id="libpq-PQbatchSyncQueue">
> +     <term>
> +      <function>PQbatchSyncQueue</function>
> +      <indexterm>
> +       <primary>PQbatchSyncQueue</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Delimits the end of a set of a batched commands by sending a <link
> +      linkend="protocol-flow-ext-query">sync message</link> and flushing
> +      the send buffer. The end of a batch serves as 
> +      the delimiter of an implicit transaction and
> +      an error recovery point; see <link linkend="libpq-batch-errors">
> +      error handling</link>.

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.


> +    <varlistentry id="libpq-PQbatchQueueCount">
> +     <term>
> +      <function>PQbatchQueueCount</function>
> +      <indexterm>
> +       <primary>PQbatchQueueCount</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Returns the number of queries still in the queue for this batch, not
> +      including any query that's currently having results being processed.
> +      This is the number of times <function>PQbatchProcessQueue</function> 
> has to be
> +      called before the query queue is empty again.
> +
> +<synopsis>
> +int PQbatchQueueCount(PGconn *conn);
> +</synopsis>
> +
> +      </para>
> +     </listitem>
> +    </varlistentry>

Given that apps are supposed to track this, I'm not sure why we have
this?


> +/*
> + * PQrecyclePipelinedCommand
> + *   Push a command queue entry onto the freelist. It must be a dangling 
> entry
> + *   with null next pointer and not referenced by any other entry's next 
> pointer.
> + */
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> +     if (entry == NULL)
> +             return;
> +     if (entry->next != NULL)
> +     {
> +             fprintf(stderr, libpq_gettext("tried to recycle non-dangling 
> command queue entry"));
> +             abort();

Don't think we use abort() in libpq like that. There's some Assert()s
tho.


>  static bool
>  PQsendQueryStart(PGconn *conn)
> @@ -1377,20 +1486,59 @@ PQsendQueryStart(PGconn *conn)
>                                                 libpq_gettext("no connection 
> to the server\n"));
>               return false;
>       }
> -     /* Can't send while already busy, either. */
> -     if (conn->asyncStatus != PGASYNC_IDLE)
> +     /* Can't send while already busy, either, unless enqueuing for later */
> +     if (conn->asyncStatus != PGASYNC_IDLE && conn->batch_status == 
> PQBATCH_MODE_OFF)
>       {
>               printfPQExpBuffer(&conn->errorMessage,
>                                                 libpq_gettext("another 
> command is already in progress\n"));
>               return false;
>       }
>  
> -     /* initialize async result-accumulation state */
> -     pqClearAsyncResult(conn);
> +     if (conn->batch_status != PQBATCH_MODE_OFF)
> +     {
> +     /*

Weirdly indented.


> +     if (conn->batch_status != PQBATCH_MODE_OFF)
> +     {
> +             pipeCmd = PQmakePipelinedCommand(conn);
> +
> +             if (pipeCmd == NULL)
> +                     return 0;                       /* error msg already 
> set */
> +
> +             last_query = &pipeCmd->query;
> +             queryclass = &pipeCmd->queryclass;
> +     }
> +     else
> +     {
> +             last_query = &conn->last_query;
> +             queryclass = &conn->queryclass;
> +     }

The amount of complexity / branches we're adding to all of these is more
than a bit unsightly.


> +/*
> + * PQbatchQueueCount
> + *   Return number of queries currently pending in batch mode
> + */
> +int
> +PQbatchQueueCount(PGconn *conn)
> +{
> +     int                     count = 0;
> +     PGcommandQueueEntry *entry;
> +
> +     if (PQbatchStatus(conn) == PQBATCH_MODE_OFF)
> +             return 0;
> +
> +     entry = conn->cmd_queue_head;
> +     while (entry != NULL)
> +     {
> +             ++count;
> +             entry = entry->next;
> +     }
> +     return count;
> +}

Ugh, O(N)? In that case I'd rather just remove this.


> +/*
> + * PQbatchBegin

Mismatched w/ actual function name.


> + *   Put an idle connection in batch mode. Commands submitted after this
> + *   can be pipelined on the connection, there's no requirement to wait for
> + *   one to finish before the next is dispatched.
> + *
> + *   Queuing of new query or syncing during COPY is not allowed.

+"a"?

> + *   A set of commands is terminated by a PQbatchQueueSync. Multiple sets of 
> batched
> + *   commands may be sent while in batch mode. Batch mode can be exited by
> + *   calling PQbatchEnd() once all results are processed.
> + *
> + *   This doesn't actually send anything on the wire, it just puts libpq
> + *   into a state where it can pipeline work.
> + */
> +int
> +PQenterBatchMode(PGconn *conn)
> +{
> +     if (!conn)
> +             return false;

true/false isn't quite in line with int return code.


> +/*
> + * PQbatchEnd

wrong name.

> + *   End batch mode and return to normal command mode.
> + *
> + *   Has no effect unless the client has processed all results
> + *   from all outstanding batches and the connection is idle.

That seems wrong - will lead to hard to diagnose errors.


> + *   Returns true if batch mode ended.
> + */
> +int
> +PQexitBatchMode(PGconn *conn)
> +{
> +     if (!conn)
> +             return false;
> +
> +     if (conn->batch_status == PQBATCH_MODE_OFF)
> +             return true;
> +
> +     switch (conn->asyncStatus)
> +     {
> +             case PGASYNC_IDLE:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, IDLE in 
> batch mode"));
> +                     break;
> +             case PGASYNC_COPY_IN:
> +             case PGASYNC_COPY_OUT:
> +             case PGASYNC_COPY_BOTH:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, COPY in 
> batch mode"));
> +                     break;

So we'll still check the queue in this case, that's a bit weird?



> +/*
> + * PQbatchQueueSync

Out of sync.

> + *   End a batch submission by sending a protocol sync. The connection will
> + *   remain in batch mode and unavailable for new non-batch commands until 
> all
> + *   results from the batch are processed by the client.

"unavailable for new non-batch commands" - that's hard to follow, and
seems pretty redundant with PQendBatchMode (or however it's called).

> + *   It's legal to start submitting another batch immediately, without 
> waiting
> + *   for the results of the current batch. There's no need to end batch mode
> + *   and start it again.
> + *
> + *   If a command in a batch fails, every subsequent command up to and 
> including
> + *   the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED 
> state. If the
> + *   whole batch is processed without error, a PGresult with PGRES_BATCH_END 
> is
> + *   produced.

Hm, should probably mention that that's only true for commands since the
last PQbatchQueueSync?



> +/*
> + * PQbatchQueueProcess

Out of sync.


> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.

Last complaint about this - think this forgiving mode is a mistake.


> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{

> +     /* This command's results will come in immediately.
> +      * Initialize async result-accumulation state */
> +     pqClearAsyncResult(conn);

I'm not following?


>  /*
>   * PQgetResult
> @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn)

> +                     if (conn->batch_status != PQBATCH_MODE_OFF)
> +                     {
> +                             /*
> +                              * batched queries aren't followed by a Sync to 
> put us back in
> +                              * PGASYNC_IDLE state, and when we do get a 
> sync we could
> +                              * still have another batch coming after this 
> one.

This needs rephrasing.


> +                              * The connection isn't idle since we can't 
> submit new
> +                              * nonbatched commands. It isn't also busy 
> since the current
> +                              * command is done and we need to process a new 
> one.
> +                              */
> +                             conn->asyncStatus = PGASYNC_QUEUED;

Not sure I like the name.


> +     if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != 
> PQBATCH_MODE_OFF)
> +     {
> +             printfPQExpBuffer(&conn->errorMessage,
> +                                               libpq_gettext("Synchronous 
> command execution functions are not allowed in batch mode\n"));
> +             return false;
> +     }

Why do we need the PGASYNC_QUEUED test here?

> +/* pqBatchFlush
> + * In batch mode, data will be flushed only when the out buffer reaches the 
> threshold value.
> + * In non-batch mode, data will be flushed all the time.
> + */
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> +     if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536))
> +             return(pqFlush(conn));
> +     return 0; /* Just to keep compiler quiet */
> +}

This should be defined in a macro or such, rather than hardcoded.


Falling over now. This seems like enough feedback for a bit of work
anyway.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

Reply via email to