/* must be something wrong */
       pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
       goto done;

Should say such like "thread %d aborted: %s() failed: ...".

After having a lookg, there are already plenty such cases. I'd say not to change anything for beta, and think of it for the next round.

====
  errno = THREAD_BARRIER_INIT(&barrier, nthreads);
  if (errno != 0)
+  {
    pg_log_fatal("could not initialize barrier: %m");
+    exit(1);

This is a run-time error. Maybe we should return 2 in that case.

I think that you are right, but there are plenty such places where exit should be 2 instead of 1 if the doc is followed:

"""Errors during the run such as database errors or problems in the script will result in exit status 2."""

My beta take is to let these as they are, i.e. pretty inconsistent all over pgbench, and schedule a cleanup on the next round.

===
    if (thread->logfile == NULL)
    {
      pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-      goto done;
+      exit(1);

Maybe we should exit with 2 this case.

Yep.

The bench is not even started, this is not really run time yet, 1 seems ok. The failure may be due to a typo in the path which comes from the user.

If we exit this case, we might also want to exit when fclose() fails. (Currently the error of fclose() is ignored.)

Not sure. I'd let it at that for now.

I stand by this.

+        /* coldly abort on connection failure */
+        pg_log_fatal("cannot create connection for thread %d client %d",
+               thread->tid, i);
+        exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).

This is not obvious to me. I think that we should be homogeneous with what is already done around.

ok for only giving the global client id.

I think that we should return with 2 here but we return with 1
in another place for the same reason..

Possibly.

Again for this one, the bench has not really started, so 1 seems fine.

        /* must be something wrong */
-        pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+        pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
        goto done;

Why doesn't a fatal error cause an immediate exit?

Good point. I do not know, but I would expect it to be the case, and AFAICR it does not.

(And if we change this to fatal, we also need to change similar errors in the same function to fatal.)

Possibly.

On second look, I think that error is fine, indeed we do not stop the process, so "fatal" it is not;

Attached Yugo-san patch with some updates discussed in the previous mails, so as to move things along.

--
Fabien.
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..202507f5d5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3171,6 +3171,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 					if ((st->con = doConnect()) == NULL)
 					{
+						/* as the bench is already running, we do not abort the process */
 						pg_log_error("client %d aborted while establishing connection", st->id);
 						st->state = CSTATE_ABORTED;
 						break;
@@ -4405,7 +4406,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer(&stats);
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6332,7 +6336,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
 				 PQhost(con), PQport(con), nclients,
@@ -6437,7 +6444,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(&barrier, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */
@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
 #endif							/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 				exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6548,7 +6558,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6572,16 +6582,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-				/*
-				 * On connection failure, we meet the barrier here in place of
-				 * GO before proceeding to the "done" path which will cleanup,
-				 * so as to avoid locking the process.
-				 *
-				 * It is unclear whether it is worth doing anything rather
-				 * than coldly exiting with an error message.
-				 */
-				THREAD_BARRIER_WAIT(&barrier);
-				goto done;
+				/* coldly abort on initial connection failure */
+				pg_log_fatal("cannot create connection for client %d",
+							 state[i].id);
+				exit(1);
 			}
 		}
 
@@ -6707,7 +6711,7 @@ threadRun(void *arg)
 					continue;
 				}
 				/* must be something wrong */
-				pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+				pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
 				goto done;
 			}
 		}

Reply via email to