Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Marina Polyakova Wed, 12 Sep 2018 06:00:13 -0700

On 11-09-2018 18:29, Fabien COELHO wrote:

Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors thatwere not retried (the client test code itself could not run correctlyor the SQL sent was somehow wrong, which is also the client's fault),can't we?


I think not.

If a client asks for something "legal", but some other client in
parallel happens to make an incompatible change which result in a
serialization or deadlock error, the clients are not responsible for
the raised errors, it is just that they happen to ask for something
incompatible at the same time. So there is no user error per se, but
the server is reporting its (temporary) inability to process what was
asked for. For these errors, retrying is fine. If the client was
alone, there would be no such errors, you cannot deadlock with
yourself. This is really an isolation issue linked to parallel
execution.

You can get other errors that cannot happen for only one client if youuse shell commands in meta commands:


starting vacuum...end.
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 6.953 ms
tps = 287.630161 (including connections establishing)
tps = 303.232242 (excluding connections establishing)
statement latencies in milliseconds and failures:
         1.636           0  BEGIN;
         1.497           0  \setshell var mkdir my_directory && echo 1
         0.007           0  \sleep 1 us
         1.465           0  \setshell var rmdir my_directory && echo 1
         1.622           0  END;

starting vacuum...end.
mkdir: cannot create directory ‘my_directory’: File exists
mkdir: could not read result of shell command

client 1 got an error in command 1 (setshell) of script 0; execution ofmeta-command failed

transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 19/20
number of failures: 1 (5.000%)
number of meta-command failures: 1 (5.000%)
maximum number of tries: 1
latency average = 11.782 ms (including failures)
tps = 161.269033 (including connections establishing)
tps = 167.733278 (excluding connections establishing)
statement latencies in milliseconds and failures:
         2.731           0  BEGIN;
         2.909           1  \setshell var mkdir my_directory && echo 1
         0.231           0  \sleep 1 us
         2.366           0  \setshell var rmdir my_directory && echo 1
         2.664           0  END;

Or if you use untrusted procedural languages in SQL expressions (see theused file in the attachments):


starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.

client 1 got an error in command 0 (SQL) of script 0; ERROR: could notcreate the directory "my_directory": File exists at line 3.

CONTEXT:  PL/Perl anonymous code block

client 1 got an error in command 0 (SQL) of script 0; ERROR: could notcreate the directory "my_directory": File exists at line 3.

CONTEXT:  PL/Perl anonymous code block

transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 18/20
number of failures: 2 (10.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 2 (10.000%)
maximum number of tries: 1
latency average = 3.282 ms (including failures)
tps = 548.437196 (including connections establishing)
tps = 637.662753 (excluding connections establishing)
statement latencies in milliseconds and failures:
         1.566           2  DO $$

starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 2.760 ms
tps = 724.746078 (including connections establishing)
tps = 853.131985 (excluding connections establishing)
statement latencies in milliseconds and failures:
         1.893           0  DO $$

Or if you try to create a function and perhaps replace an existing one:

starting vacuum...end.

client 0 got an error in command 0 (SQL) of script 0; ERROR: duplicatekey value violates unique constraint "pg_proc_proname_args_nsp_index"DETAIL: Key (proname, proargtypes, pronamespace)=(my_function, , 2200)already exists.

client 0 got an error in command 0 (SQL) of script 0; ERROR: tupleconcurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR: tupleconcurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR: tupleconcurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR: tupleconcurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR: tupleconcurrently updated


transaction type: pgbench_create_function.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/20
number of failures: 10 (50.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 10 (50.000%)
maximum number of tries: 1
latency average = 82.881 ms (including failures)
tps = 12.065492 (including connections establishing)
tps = 12.092216 (excluding connections establishing)
statement latencies in milliseconds and failures:

82.549 10 CREATE OR REPLACE FUNCTION my_function()RETURNS integer AS 'select 1;' LANGUAGE SQL;

Why not handle client errors that can occur (but they may also notoccur) the same way? (For example, always abort the client, orconversely do not make aborts in these cases.) Here's an example ofsuch error:
client 5 got an error in command 1 (SQL) of script 0; ERROR: divisionby zero
This is an interesting case. For me we must stop the script because
the client is asking for something "stupid", and retrying the same
won't change the outcome, the division will still be by zero. It is
the client responsability not to ask for something stupid, the bench
script is buggy, it should not submit illegal SQL queries. This is
quite different from submitting something legal which happens to fail.
...
I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?
I'm not sure that the main debugging output will give a good clue ofwhat's happened without full messages about errors, retries andfailures...
I'm more argumenting about letting the user decide what they want.
These lines are quite long - do you suggest to wrap them this way?
Sure, if it is too long, then wrap.

Ok!

Function getTransactionStatus name does not seem to correspond fullyto what the function does. There is a passthru case which should beeither avoided or clearly commented.
I don't quite understand you - do you mean that in fact this functionfinds out whether we are in a (failed) transaction block or not? Or doyou mean that the case of PQTRANS_INTRANS is also ok?...
The former: although the function is named "getTransactionStatus", it
does not really return the "status" of the transaction (aka
PQstatus()?).

Thank you, I'll think how to improve it. Perhaps the namecheckTransactionStatus will be better...

I'd insist in a comment that "cnt" does not include "skipped"transactions
(anymore).
If you mean CState.cnt I'm not sure if this is practically usefulbecause the code uses only the sum of all client transactionsincluding skipped and failed... Maybe we can rename this field tonxacts or total_cnt?
I'm fine with renaming the field if it makes thinks clearer. They are
all counters, so naming them "cnt" or "total_cnt" does not help much.
Maybe "succeeded" or "success" to show what is really counted?

Perhaps renaming of StatsData.cnt is better than just adding a commentto this field. But IMO we have the same problem (They are all counters,so naming them "cnt" or "total_cnt" does not help much.) for CState.cntwhich cannot be named in the same way because it also includes skippedand failed transactions.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

DO $$
  my $directory = "my_directory";
  mkdir $directory
    or elog(ERROR, qq{could not create the directory "$directory": $!});
  select(undef, undef, undef, 0.00000000001);
  rmdir $directory or elog(ERROR, qq{could not delete the directory 
"$directory": $!});
$$ LANGUAGE plperlu;

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Reply via email to