Hi,
On 2025/09/22 11:56, Fujii Masao wrote:
> On Sat, Sep 20, 2025 at 12:21 AM Yugo Nagata <[email protected]> wrote:
>>> While testing, I found that running pgbench with --continue-on-error and
>>> pipeline mode triggers the following assertion failure. Could this be
>>> a bug in the patch?
>>>
>>> ---------------------------------------------------
>>> $ cat pipeline.pgbench
>>> \startpipeline
>>> DO $$
>>> BEGIN
>>> PERFORM pg_sleep(3);
>>> PERFORM pg_terminate_backend(pg_backend_pid());
>>> END $$;
>>> \endpipeline
>>>
>>> $ pgbench -n --debug --verbose-errors -f pipeline.pgbench -c 2 -t 4 -M
>>> extended --continue-on-error
>>> ...
>>> Assertion failed:
>>> (sql_script[st->use_file].commands[st->command]->type == 1), function
>>> commandError, file pgbench.c, line 3081.
>>> Abort trap: 6
>>> ---------------------------------------------------
>>>
>>> When I ran the same command without --continue-on-error,
>>> the assertion failure did not occur.
>>
>> I think this bug was introduced by commit 4a39f87acd6e, which enabled pgbench
>> to retry and added the --verbose-errors option, rather than by this patch
>> itself.
>>
>> The assertion failure occurs in commandError(), which is called to report an
>> error when
>> it can be retried (i.e., serializable failure or deadlock), or when
>> --continue-on-error
>> is used after this patch.
>>
>> Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
>>
>> This assumes the error is always detected during SQL command execution, but
>> that’s not correct, since in pipeline mode, the error can be detected when
>> a \endpipeline meta-command is executed.
>>
>> $ cat deadlock.sql
>> \startpipeline
>> begin;
>> lock b;
>> lock a;
>> end;
>> \endpipeline
>>
>> $ cat deadlock2.sql
>> \startpipeline
>> begin;
>> lock a;
>> lock b;
>> end;
>> \endpipeline
>>
>> $ pgbench --verbose-errors -f deadlock.sql -f deadlock2.sql -c 2 -T 3 -M
>> extended
>> pgbench (19devel)
>> starting vacuum...end.
>> pgbench: pgbench.c:3062: commandError: Assertion
>> `sql_script[st->use_file].commands[st->command]->type == 1' failed.
>>
>> Although one option would be to remove this assertion, if we prefer to keep
>> it,
>> the attached patch fixes the issue.
>
> Thanks for the analysis and the patch!
>
> I think we should fix the issue rather than just removing the assertion.
> I'd like to apply your patch with the following source comment:
>
> ---------------------------
> Errors should only be detected during an SQL command or the \endpipeline
> meta command. Any other case triggers an assertion failure.
> --------------------------
>
>
> With your patch and the continue-on-error patches, running the same pgbench
> command I used to reproduce the assertion failure upthread causes pgbench
> to hang. From my analysis, it enters an infinite loop in discardUntilSync().
> That loop waits for PGRES_PIPELINE_SYNC, but since the connection has already
> been closed, it never arrives, leaving pgbench stuck.
>
> Could this also happen without the continue-on-error patch, or is it a new bug
> introduced by it? Either way, it seems pgbench needs to exit the loop when
> the result status is PGRES_FATAL_ERROR.
>
Thank you for the analysis and the patches.
I think the issue is a new bug because we have transitioned to CSTATE_ABORT
immediately after queries failed, without executing discardUntilSync().
I've attached a patch that fixes the assertion error. The content of v1 patch by
Mr. Nagata is also included. I would appreciate it if you review my patch.
Regards,
Rintaro Ikeda
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6e9304e254f..cd5faf3370a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3078,7 +3078,13 @@ commandFailed(CState *st, const char *cmd, const char
*message)
static void
commandError(CState *st, const char *message)
{
- Assert(sql_script[st->use_file].commands[st->command]->type ==
SQL_COMMAND);
+ /*
+ Errors should only be detected during an SQL command or the
\endpipeline
+ meta command. Any other case triggers an assertion failure.
+ */
+ Assert(sql_script[st->use_file].commands[st->command]->type ==
SQL_COMMAND ||
+ sql_script[st->use_file].commands[st->command]->meta ==
META_ENDPIPELINE);
+
pg_log_info("client %d got an error in command %d (SQL) of script %d;
%s",
st->id, st->command, st->use_file, message);
}
@@ -3525,9 +3531,7 @@ discardUntilSync(CState *st)
{
PGresult *res = PQgetResult(st->con);
- if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
- received_sync = true;
- else if (received_sync)
+ if (received_sync == true)
{
/*
* PGRES_PIPELINE_SYNC must be followed by another
@@ -3541,11 +3545,23 @@ discardUntilSync(CState *st)
*/
st->num_syncs = 0;
PQclear(res);
- break;
+ goto done;
}
- PQclear(res);
+
+ switch (PQresultStatus(res))
+ {
+ case PGRES_PIPELINE_SYNC:
+ received_sync = true;
+ case PGRES_FATAL_ERROR:
+ PQclear(res);
+ goto done;
+ default:
+ PQclear(res);
+ }
+
}
+done:
/* exit pipeline */
if (PQexitPipelineMode(st->con) != 1)
{