Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Yugo NAGATA Wed, 07 Jul 2021 16:37:22 -0700

Hello Fabien,

I attached the updated patch (v14)!

On Wed, 30 Jun 2021 17:33:24 +0200 (CEST)
Fabien COELHO <coe...@cri.ensmp.fr> wrote:

> >> --report-latencies -> --report-per-command: should we keep supporting
> >> the previous option?
> >
> > Ok. Although now the option is not only for latencies, considering users who
> > are using the existing option, I'm fine with this. I got back this to the
> > previous name.
> 
> Hmmm. I liked the new name! My point was whether we need to support the 
> old one as well for compatibility, or whether we should not bother. I'm 
> still wondering. As I think that the new name is better, I'd suggest to 
> keep it.

Ok. I misunderstood it. I returned the option name to report-per-command.

If we keep report-latencies, I can imagine the following choises:
- use report-latencies to print only latency information
- use report-latencies as alias of report-per-command for compatibility
  and remove at an appropriate timing. (that is, treat as deprecated)

Among these, I prefer the latter because ISTM we would not need many options
for reporting information per command. However, actually, I wander that we
don't have to keep the previous one if we plan to remove it eventually.

> >> --failures-detailed: if we bother to run with handling failures, should
> >> it always be on?
> >
> > If we print other failures that cannot be retried in future, it could a lot
> > of lines and might make some users who don't need details of failures 
> > annoyed.
> > Moreover, some users would always need information of detailed failures in 
> > log,
> > and others would need only total numbers of failures.
> 
> Ok.
> 
> > Currently we handle only serialization and deadlock failures, so the number 
> > of
> > lines printed and the number of columns of logging is not large even under 
> > the
> > failures-detail, but if we have a chance to handle other failures in future,
> > ISTM adding this option makes sense considering users who would like simple
> > outputs.
> 
> Hmmm. What kind of failures could be managed with retries? I guess that on 
> a connection failure we can try to reconnect, but otherwise it is less 
> clear that other failures make sense to retry.

Indeed, there would few failures that we should retry and I can not imagine
other than serialization , deadlock, and connection failures for now. However,
considering reporting the number of failed transaction and its causes in future,
as you said

> Given that we manage errors, ISTM that we should not necessarily
> stop on other not retried errors, but rather count/report them and
> possibly proceed. 

, we could define more a few kind of failures. At least we can consider
meta-command and other SQL commands errors in addition to serialization, 
deadlock, connection failures. So, the total number of kind of failures would
be five at least and reporting always all of them results a lot of lines and
columns in logging.

> >> --debug-errors: I'm not sure we should want a special debug mode for that,
> >> I'd consider integrating it with the standard debug, or just for 
> >> development.
> >
> > I think --debug is a debug option for telling users the pgbench's internal
> > behaviors, that is, which client is doing what. On other hand, 
> > --debug-errors
> > is for telling users what error caused a retry or a failure in detail. For
> > users who are not interested in pgbench's internal behavior (sending a 
> > command,
> > receiving a result, ... ) but interested in actual errors raised during 
> > running
> > script, this option seems useful.
> 
> Ok. The this is not really about debug per se, but a verbosity setting?

I think so.

> Maybe --verbose-errors would make more sense? I'm unsure. I'll think about 
> it.

Agreed. This seems more proper than the previous one, so I fixed the name to
--verbose-errors.

> > Sorry, I couldn't understand your suggestion. Is this about the order of 
> > case
> > statements or pg_log_error?
> 
> My sentence got mixed up. My point was about the case order, so that they 
> are put in a more logical order when reading all the cases.

Ok. Considering the loical order, I moved WAIT_ROLLBACK_RESULT into
between ERROR and RETRY, because WAIT_ROLLBACK_RESULT comes atter ERROR state,
and RETRY comes after ERROR or WAIT_ROLLBACK_RESULT..

> >> Currently, ISTM that the retry on error mode is implicitely always on.
> >> Do we want that? I'd say yes, but maybe people could disagree.
> >
> > The default values of max-tries is 1, so the retry on error is off.
> 
> > Failed transactions are retried only when the user wants it and
> > specifies a valid value to max-treis.
> 
> Ok. My point is that we do not stop on such errors, whereas before ISTM 
> that we would have stopped, so somehow the default behavior has changed 
> and the previous behavior cannot be reinstated with an option. Maybe that 
> is not bad, but this is a behavioral change which needs to be documented 
> and argumented.

I understood. Indeed, there is a behavioural change about whether we abort
the client after some types of errors or not. Now, serialization / deadlock
errors don't cause the abortion and are recorded as failures whereas other
errors cause to abort the client.

If we would want to record other errors as failures in future, we would need
a new option to specify which type of failures (or all types of errors, maybe)
should be reported. Until that time, ISTM we can treat serialization and
deadlock as something special errors to be reported as failures.

I rewrote "Failures and Serialization/Deadlock Retries" section a bit to
emphasis that such errors are treated differently than other errors. 

> >> See the attached files for generating deadlocks reliably (start with 2 
> >> clients). What do you think? The PL/pgSQL minimal, it is really 
> >> client-code oriented.
> >
> > Sorry, but I cannot find the attached file.
> 
> Sorry. Attached to this mail. The serialization stuff does not seem to 
> work as well as the deadlock one. Run with 2 clients.

Hmmm, your test didn't work well for me. Both tests got stuck in
pgbench_deadlock_wait() and pgbench didn't finish. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nag...@sraoss.co.jp>

>From feb0391c642a72568b45946f4aae19f8976fd713 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nag...@sraoss.co.jp>
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v14 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 402 +++++++-
 src/bin/pgbench/pgbench.c                    | 954 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1485 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..0a3ccb3c92 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -528,6 +532,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -594,23 +609,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -738,6 +759,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -748,6 +789,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; moreover, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--verbose-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -943,8 +1016,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1017,6 +1090,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2207,7 +2285,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2228,6 +2306,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization_failure</literal> or
+   <literal>deadlock_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2256,6 +2345,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2271,7 +2378,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2285,7 +2392,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2293,21 +2409,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2319,13 +2439,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2339,27 +2490,63 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 10.870 ms
 latency stddev = 7.341 ms
 initial connection time = 30.954 ms
 tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of failed transactions: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of transactions retried: 5629 (56.290%)
+total number of retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.686560 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2373,6 +2560,139 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the client is not aborted. In such cases, the current
+   transaction is rolled back, which also includes setting the client variables
+   as they were before the run of this transaction (it is assumed that one
+   transaction script contains only one transaction; see
+   <xref linkend="transactions-and-scripts" endterm="transactions-and-scripts-title"/>
+   for more information). Transactions with serialization or deadlock errors are
+   repeated after rollbacks until they complete successfully or reach the maximum
+   number of tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of retries (specified by the <option>--latency-limit</option> option) / the end
+   of benchmark (specified by the <option>--time</option> option). If
+   the last trial run fails, this transaction will be reported as failed but
+   the client is not aborted and continue to work.
+  </para>
+
+  <note>
+   <para>
+    Without specifying the <option>--max-tries</option> option a transaction will
+    never be retried after a serialization or deadlock error because its default
+    values is 1. Use an unlimited number of tries (<literal>--max-tries=0</literal>)
+    and the <option>--latency-limit</option> option to limit only the maximum time
+    of tries. You can also use the <option>--time</option> to limit the benchmark
+    duration under an unlimited number of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--verbose-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3629caba42..1e3d024bde 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -74,6 +74,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -273,9 +275,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -360,9 +387,65 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +458,30 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
@@ -446,6 +553,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +630,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +738,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +753,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +769,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	verbose_errors = false;	/* print verbose messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +908,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --verbose-errors         print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1307,6 +1465,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1315,22 +1477,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
 	{
-		addToSimpleStats(&stats->latency, lat);
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+	switch (estatus)
+	{
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
+
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2861,6 +3052,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2868,6 +3062,17 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2975,6 +3180,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -3017,6 +3249,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3031,6 +3264,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3055,6 +3289,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3072,6 +3307,20 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (verbose_errors)
+						commandError(st, PQerrorMessage(st->con));
+					if (PQpipelineStatus(st->con) == PQ_PIPELINE_ABORTED)
+						PQpipelineSync(st->con);
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3150,6 +3399,165 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		dest_var->svalue = (source_var->svalue == NULL) ?
+			NULL : pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if the benchmark duration is over.
+	 */
+	if (timer_exceeded)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+checkTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pg_log_error("perhaps the backend died while processing");
+				return false;
+			}
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, pg_time_usec_t *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	pg_time_now_lazy(now);
+	return (100.0 * (*now - st->txn_scheduled) / latency_limit);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3179,6 +3587,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		bool		in_tx_block;
 
 		switch (st->state)
 		{
@@ -3187,6 +3597,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3223,6 +3637,14 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->cs_func_rs;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3374,6 +3796,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3512,6 +3936,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
@@ -3558,6 +3984,187 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				if (in_tx_block)
+				{
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else
+				{
+					/* Check if we can retry the error. */
+					st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				break;
+
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						do
+						{
+							res = PQgetResult(st->con);
+							if (res)
+								PQclear(res);
+						} while (res);
+						/* Check if we can retry the error. */
+						st->state =
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (verbose_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d repeats the transaction after the error (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/* Count tries and retries */
+				st->tries++;
+				if (report_per_command)
+					command->retries++;
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (verbose_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d ends the failed transaction (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					else if (timer_exceeded)
+						appendPQExpBuffer(&buf, ", the duration time is exceeded");
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
+				 */
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3572,6 +4179,29 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (in_tx_block)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					finishCon(st);
@@ -3803,6 +4433,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization_failure";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock_failure";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3847,6 +4514,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3857,6 +4532,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3864,22 +4543,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3888,7 +4571,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3896,10 +4580,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3908,20 +4592,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3931,7 +4607,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4778,6 +5455,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5446,7 +6125,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5473,23 +6154,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5502,8 +6190,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5512,6 +6200,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5571,9 +6265,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5590,35 +6285,65 @@ printResults(StatsData *total,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
 	}
 
+	if (failures > 0)
+	{
+		printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			if (total->serialization_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+		}
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5644,7 +6369,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5663,6 +6388,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5672,25 +6400,60 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				if (failures > 0)
+				{
+					printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
+
+					if (failures_detailed)
+					{
+						if (total->serialization_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5698,10 +6461,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5782,7 +6554,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5804,6 +6576,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"verbose-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6172,6 +6947,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* verbose-errors */
+				benchmarking_option_set = true;
+				verbose_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6353,6 +7150,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6561,6 +7367,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6709,7 +7519,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6798,7 +7609,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3aa9d5d753..f3351b51cf 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,11 @@ use Config;
 
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench, with parameters:
@@ -159,7 +163,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1239,6 +1244,214 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failed transactions)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{total number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --verbose-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failed transactions)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{total number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 346a2667fc..56f7226c8e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -178,6 +178,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index a562e28846..c304014f51 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index c64c655775..9c495072aa 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

>From e83a30aea551ed3af2e5a90bbff887953e717002 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nag...@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v14 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4aeccd93af..3629caba42 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -287,6 +287,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -304,6 +310,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1418,39 +1440,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1582,21 +1604,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1608,23 +1646,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1633,12 +1665,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1689,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1676,12 +1709,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1740,7 +1774,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1761,7 +1795,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1776,12 +1810,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2649,7 +2684,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2719,7 +2754,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2750,7 +2785,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2811,7 +2846,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2863,7 +2898,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2874,7 +2909,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2921,7 +2956,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -3014,7 +3049,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3075,14 +3110,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3614,7 +3649,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3635,7 +3670,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3705,7 +3740,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3713,7 +3748,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -5995,7 +6030,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6335,19 +6370,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6382,11 +6417,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6395,30 +6430,32 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Reply via email to