Hi hackers,

On Mon, 24 May 2021 11:29:10 +0900
Yugo NAGATA <nag...@sraoss.co.jp> wrote:

> Hi hackers,
> 
> On Tue, 10 Mar 2020 09:48:23 +1300
> Thomas Munro <thomas.mu...@gmail.com> wrote:
> 
> > On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> > > >> Thank you very much! I'm going to send a new patch set until the end of
> > > >> this week (I'm sorry I was very busy in the release of Postgres Pro
> > > >> 11...).
> > > >
> > > > Is anyone interested in rebasing this, and summarising what needs to
> > > > be done to get it in?  It's arguably a bug or at least quite
> > > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
> > > > that a couple of forks already ship Marina's patch set.
> 
> I got interested in this and now looking into the patch and the past 
> discussion. 
> If anyone other won't do it and there are no objection, I would like to rebase
> this. Is that okay?

I rebased and fixed the previous patches (v11) rewtten by Marina Polyakova,
and attached the revised version (v12).

v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).

v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of 
transactions with serialization/deadlock failures (see the detailed 
description in the file).

These are the revised versions from v11-0002 and v11-0003. v11-0001
(for the RandomState structure) is not included because this has been
already committed (40923191944). V11-0004 (for a separate error reporting
function) is not included neither because pgbench now uses common logging
APIs (30a3e772b40).

In addition to rebase on master, I updated the patch according with the
review from Fabien COELHO [1] and discussions after this. Also, I added
some other fixes through my reviewing the previous patch.

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.21.1809081450100.10506%40lancre

Following are fixes according with Fabian's review.

> * Features

> As far as the actual retry feature is concerned, I'd say we are nearly 
> there. However I have an issue with changing the behavior on meta command 
> and other sql errors, which I find not desirable.
...
> I do not think that these changes of behavior are desirable. Meta command and
> miscellaneous SQL errors should result in immediatly aborting the whole run,
> because the client test code itself could not run correctly or the SQL sent
> was somehow wrong, which is also the client's fault, and the server 
> performance bench does not make much sense in such conditions.
> 
> ISTM that the focus of this patch should only be to handle some server 
> runtime errors that can be retryed, but not to change pgbench behavior on 
> other kind of errors. If these are to be changed, ISTM that it would be a 
> distinct patch and would require some discussion, and possibly an option 
> to enable it or not if some use case emerge. AFA this patch is concerned, 
> I'd suggest to let that out.

Previously, all SQL and meta command errors could be retried, but I fixed
to allow only serialization & deadlock errors to be retried. 

> Doc says "you cannot use an infinite number of retries without 
> latency-limit..."
> 
> Why should this be forbidden? At least if -T timeout takes precedent and
> shortens the execution, ISTM that there could be good reason to test that.
> Maybe it could be blocked only under -t if this would lead to an non-ending
> run.

I fixed to allow to use --max-tries with -T option even if latency-limit
is not used.

> As "--print-errors" is really for debug, maybe it could be named
> "--debug-errors". I'm not sure that having "--debug" implying this option
> is useful: As there are two distinct options, the user may be allowed
> to trigger one or the other as they wish?

print-errors was renamed to debug-errors.

> makeVariableValue error message is not for debug, but must be kept in all
> cases, and the false returned must result in an immediate abort. Same thing 
> about
> lookupCreateVariable, an invalid name is a user error which warrants an 
> immediate
> abort. Same thing again about coerce* functions or evalStandardFunc...
> Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

"DEBUG_ERRORS" messages unrelated to serialization & deadlock errors were 
removed.

> sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff 
> is
> really about the transaction script, not dealing with errors, esp as there is 
> no
> significant advantage in preparing a "ROLLBACK" statement which is short and 
> has
> no parameters. I'd suggest to remove this function and just issue
> PQsendQuery("ROLLBACK;") in all cases.

Now, we just issue PQsendQuery("ROLLBACK;").

> In copyVariables, I'd simplify
>
>  + if (source_var->svalue == NULL)
>  +   dest_var->svalue = NULL;
>  + else
>  +   dest_var->svalue = pg_strdup(source_var->svalue);
>
>as:
>   dest_var->value = (source_var->svalue == NULL) ? NULL : 
> pg_strdup(source_var->svalue);

Fixed using a ternary operator.

>  + if (sqlState)   ->   if (sqlState != NULL) ?

Fixed.

> Function getTransactionStatus name does not seem to correspond fully to what 
> the
> function does. There is a passthru case which should be either avoided or
> clearly commented.

This was renamed to checkTransactionStatus according with [2].

[2] 
https://www.postgresql.org/message-id/c262e889315625e0fc0d77ca78fe2eac%40postgrespro.ru

>  - commandFailed(st, "SQL", "perhaps the backend died while processing");
>  + clientAborted(st,
>  +              "perhaps the backend died while processing");
>
> keep on one line?

This fix that replaced commandFailed with clientAborted was removed.
(See below)

>  + if (doRetry(st, &now))
>  +   st->state = CSTATE_RETRY;
>  + else
>  +   st->state = CSTATE_FAILURE;
>
> -> st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;

Fixed using a ternary operator.

> * Comments

> "There're different types..." -> "There are different types..."
> "after the errors and"... -> "after errors and"...
> "the default value of max_tries is set to 1" -> "the default value
> of max_tries is 1"
> "We cannot retry the transaction" -> "We cannot retry a transaction"
> "may ultimately succeed or get a failure," -> "may ultimately succeed or 
> fail,"

Fixed.

> Overall, the comment text in StatsData is very clear. However they are not
> clearly linked to the struct fields. I'd suggest that earch field when used
> should be quoted, so as to separate English from code, and the struct name
> should always be used explicitely when possible.

The comment in StatsData was fixed to clarify what each filed in this struct
represents.

> I'd insist in a comment that "cnt" does not include "skipped" transactions
> (anymore).

StatsData.cnt has a comment "number of successful transactions, not including
'skipped'", and CState.cnt has a comment "skipped and failed transactions are
also counted here".

> * Documentation:

> ISTM that there are too many "the":
>   - "turns on the option ..." -> "turns on option ..."
>   - "When the option ..." -> "When option ..."
>   - "By default the option ..." -> "By default option ..."
>   - "only if the option ..." -> "only if option ..."
>   - "combined with the option ..." -> "combined with option ..."
>   - "without the option ..." -> "without option ..."

The previous patch used a lot of "the option xxxx", but I fixed
them to "the xxxx option" because I found that the documentation
uses such way for referring to a certain option. For example,

- You can (and, for most purposes, probably should) increase the number
  of rows by using the <option>-s</option> (scale factor) option. 
- The prefix can be changed by using the <option>--log-prefix</option> option.
- If the <option>-j</option> option is 2 or higher, so that there are multiple
  worker threads,

>   - "is the sum of all the retries" -> "is the sum of all retries"
> "infinite" -> "unlimited" 
> "not retried at all" -> "not retried" (maybe several times). 
> "messages of all errors" -> "messages about all errors". 
> "It is assumed that the scripts used do not contain" ->
> "It is assumed that pgbench scripts do not contai

Fixed.


Following are additional fixes based on my review on the previous patch.

* About error reporting

In the previous patch, commandFailed() was changed to report an error
that doesn't immediately abort the client, and clientAborted() was
added to report an abortion of the client. In the attached patch,
behaviors around errors other than serialization and deadlock are
not changed and such errors cause the client to abort, so commandFaile()
is used without any changes to report a client abortion, and commandError()
is added to report an error that can be retried under --debug-error.

* About progress reporting

In the previous patch, the number of failures was reported only when any
transaction was failed, and statistics of retry was reported only when
any transaction was retried. This means, the number of columns in the
reporting were different depending on the interval. This was odd and
harder to parse the output.

In the attached patch, the number of failures is always reported, and
the retry statistic is reported when max-tries is not 1. 

* About result outputs

In the previous patch, the number of failed transaction, the number
of retried transaction, and the number of total retries were reported
as:

 number of failures: 324 (3.240%)
 ...
 number of retried: 5629 (56.290%)
 number of retries: 103299

I think this was confusable. Especially, it was unclear for me what
"retried" and "retries" represent repectively. Therefore, in the
attached patch, they are reported as:

 number of transactions failed: 324 (3.240%)
 ...
 number of transactions retried: 5629 (56.290%)
 number of total retries: 103299

which clarify that first two are the numbers of transactions and the
last one is the number of retries over all transactions.

* Abourt average connection time

In the previous patch, this was calculated as "conn_total_duration / total->cnt"
where conn_total_duration is the cumulated connection time sumed over threads 
and
total->cnt is the number of transaction that is successfully processed.

However, the average connection time could be overestimated because 
conn_total_duration includes a connection time of failed transaction
due to serialization and deadlock errors. So, in the attached patch,
this is calculated as "conn_total_duration / total->cnt + failures".


Regards,
Yugo Nagata

-- 
Yugo NAGATA <nag...@sraoss.co.jp>
>From ae18f7445eff881800a398c27c850806048b060f Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nag...@sraoss.co.jp>
Date: Fri, 28 May 2021 10:48:57 +0900
Subject: [PATCH v12 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --debug-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 399 +++++++-
 src/bin/pgbench/pgbench.c                    | 952 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1480 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..6811a6b29c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -528,6 +532,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -594,23 +609,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the 
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -738,6 +759,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -748,6 +789,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; more over, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without 
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--debug-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -943,8 +1016,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1017,6 +1090,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2207,7 +2285,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2228,6 +2306,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the 
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization_failure</literal> or 
+   <literal>deadlock_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2256,6 +2345,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2271,7 +2378,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2285,7 +2392,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2293,21 +2409,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2319,13 +2439,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2339,27 +2490,64 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 10.870 ms
 latency stddev = 7.341 ms
 initial connection time = 30.954 ms
 tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of transactions failed: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of transactions retried: 5629 (56.290%)
+number of total retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.650224 (including connections establishing)
+tps = 413.686560 (excluding connections establishing)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2373,6 +2561,135 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the current transaction is rolled back which also
+   includes setting the client variables as they were before the run of this
+   transaction (it is assumed that one transaction script contains only one
+   transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock errors are repeated after
+   rollbacks until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    Although without the <option>--max-tries</option> option the transaction
+    will never be retried after an error, use an unlimited number of tries
+    (<literal>--max-tries=0</literal>) and the <option>--latency-limit</option>
+    option or the <option>--time</option> to limit only the maximum time of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--debug-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8acda86cad..77888146e2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -74,6 +74,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -273,9 +275,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 0;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -360,9 +387,65 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them)
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactinos) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +458,30 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
@@ -446,6 +553,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +630,21 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction after a serialization or
+								 * a deadlock error? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +739,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +754,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +770,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	debug_errors = false;	/* print debug messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +909,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --debug-errors           print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1307,6 +1466,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1315,22 +1478,49 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 retries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2865,6 +3055,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2872,6 +3065,19 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+
+	Assert(command->type == SQL_COMMAND);
+	pg_log_error("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2979,6 +3185,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -3021,6 +3254,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3035,6 +3269,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3059,6 +3294,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3076,6 +3312,20 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (debug_errors)
+						commandError(st, PQerrorMessage(st->con));
+					if (PQpipelineStatus(st->con) == PQ_PIPELINE_ABORTED)
+						PQpipelineSync(st->con);
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3154,6 +3404,160 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		dest_var->svalue = (source_var->svalue == NULL) ?
+			NULL : pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries
+	 * or time is over.
+	 */
+	if ((max_tries && st->retries + 1 >= max_tries) || timer_exceeded)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+checkTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pg_log_error("perhaps the backend died while processing");
+				return false;
+			}
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, pg_time_usec_t *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	pg_time_now_lazy(now);
+	return (100.0 * (*now - st->txn_scheduled) / latency_limit);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3183,6 +3587,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		bool		in_tx_block;
 
 		switch (st->state)
 		{
@@ -3191,6 +3597,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->retries = 0;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3227,6 +3637,14 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->cs_func_rs;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3378,6 +3796,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3516,10 +3936,55 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
 
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						do
+						{
+							res = PQgetResult(st->con);
+							if (res)
+								PQclear(res);
+						} while (res);
+						/* Check if we can retry the error. */
+						st->state =
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
 				/*
 				 * Wait until sleep is done. This state is entered after a
 				 * \sleep metacommand. The behavior is similar to
@@ -3562,6 +4027,132 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				if (in_tx_block)
+				{
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else
+				{
+					/* Check if we can retry the error. */
+					st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the retry. */
+				st->retries++;
+				if (report_per_command)
+					command->retries++;
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (debug_errors)
+				{
+					fprintf(stderr,
+							"client %d repeats the transaction after the error (try %d",
+							st->id, st->retries);
+					if (max_tries)
+						fprintf(stderr, "/%d", max_tries);
+					if (latency_limit)
+						fprintf(stderr,
+								", %.3f%% of the maximum time of tries was used",
+								getLatencyUsed(st, &now));
+					fprintf(stderr, ")\n");
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (debug_errors)
+				{
+					fprintf(stderr,
+							"client %d ends the failed transaction (try %d",
+							st->id, st->retries + 1);
+					if (max_tries)
+						fprintf(stderr, "/%d", max_tries);
+					if (latency_limit)
+						fprintf(stderr,
+								", %.3f%% of the maximum time of tries was used",
+								getLatencyUsed(st, &now));
+					else if (timer_exceeded)
+						fprintf(stderr,", the duration time is exceeded");
+					fprintf(stderr, ")\n");
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
+				 */
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3576,6 +4167,29 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (in_tx_block)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					finishCon(st);
@@ -3807,6 +4421,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization_failure";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock_failure";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal(stderr, "unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3851,6 +4502,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3861,6 +4520,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3868,22 +4531,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->retries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3892,7 +4559,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3900,10 +4568,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3912,20 +4580,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->retries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3935,7 +4595,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->retries);
 }
 
 
@@ -4782,6 +5443,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5450,7 +6113,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5477,23 +6142,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5506,8 +6178,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5516,6 +6188,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5575,9 +6253,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5594,35 +6273,65 @@ printResults(StatsData *total,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
+	}
+
+	if (failures > 0)
+	{
+		printf("number of transactions failed: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			if (total->serialization_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+		}
 	}
 
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("number of total retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5648,7 +6357,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5667,6 +6376,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5676,25 +6388,60 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
+
+				if (failures > 0)
+				{
+					printf(" - number of transactions failed: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+					if (failures_detailed)
+					{
+						if (total->serialization_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - number of total retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5702,10 +6449,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5786,7 +6542,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5808,6 +6564,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"debug-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5845,6 +6604,7 @@ main(int argc, char **argv)
 
 	PGconn	   *con;
 	char	   *env;
+	bool		retry = false;	/* retry transactions with errors or not */
 
 	int			exit_code = 0;
 
@@ -6176,6 +6936,36 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+
+					/*
+					 * Always retry transactions with errors if this option is
+					 * used. But if its value is 0, use the option
+					 * --latency-limit to limit the number of tries.
+					 */
+					retry = true;
+
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* debug-errors */
+				benchmarking_option_set = true;
+				debug_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6357,6 +7147,20 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (retry && !(latency_limit || duration > 0))
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+		else if (!retry)
+		{
+			/* By default transactions with errors are not retried */
+			max_tries = 1;
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6565,6 +7369,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6713,7 +7521,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6802,7 +7611,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 923203ea51..4488cf926c 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,11 @@ use Config;
 
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench, with parameters:
@@ -159,7 +163,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1225,6 +1230,214 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --debug-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of transactions failed)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{number of total retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --debug-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of transactions failed)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{number of total retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 9023fac52d..5bf9ab1f0e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -179,6 +179,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index a562e28846..c304014f51 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index c64c655775..9c495072aa 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

>From fd3e1e0203f8b2b0c500d9f1f9905d315e97b6f6 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nag...@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v12 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 169 ++++++++++++++++++++++++--------------
 1 file changed, 106 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e61055b6b7..8acda86cad 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -287,6 +287,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -304,6 +310,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1418,39 +1440,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1582,21 +1604,43 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array. It is assumed that the sum of the number of current variables and the
+ * number of needed variables is less than or equal to (INT_MAX -
+ * VARIABLES_ALLOC_MARGIN).
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		/*
+		 * We don't want to allocate variables one by one; for efficiency, add a
+		 * constant margin each time it overflows.
+		 */
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1608,23 +1652,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1633,12 +1671,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1695,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1676,12 +1715,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1740,7 +1780,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1761,7 +1801,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1776,12 +1816,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2647,7 +2688,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2717,7 +2758,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2748,7 +2789,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2809,7 +2850,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2861,7 +2902,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2872,7 +2913,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2919,7 +2960,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -3012,7 +3053,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3073,14 +3114,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3612,7 +3653,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3633,7 +3674,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3703,7 +3744,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3711,7 +3752,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -5993,7 +6034,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6333,19 +6374,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6380,11 +6421,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6393,30 +6434,32 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

Reply via email to