I have seen this numerous times but had not dug into it, until now.

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed.  The "append" behavior dates back to 717f6d608.

I think it should either truncate the logfiles, or error early if any of the
files exist.  Or it could put all its output files into a newly-created
subdirectory.  Or this message could be output to the per-db logfiles, and not
just the static ones:
| "pg_upgrade run on %s".

For the per-db logfiels with OIDs in their name, changing open() from "append"
mode to truncate mode doesn't work, since they're written to in parallel.
They have to be removed/truncated in advance.

This is one possible fix.  You can test its effect by deliberately breaking one
of the calls to exec_progs(), like this.

-  "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+  "\"%s/pg_restore\" %s %s --exit-on-error --verboose "
>From 1b1a86df1febce5f55257c625b81e1499f6bfce3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sat, 11 Dec 2021 19:19:50 -0600
Subject: [PATCH] pg_upgrade: fail if logfiles exist..

Rather than appending to them, which is confusing since old errors are still
present.
---
 src/bin/pg_upgrade/check.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1106912863..92060e5abe 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -18,6 +18,7 @@ static void check_new_cluster_is_empty(void);
 static void check_databases_are_compatible(void);
 static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
 static bool equivalent_locale(int category, const char *loca, const char *locb);
+static void remove_logfiles(void);
 static void check_is_install_user(ClusterInfo *cluster);
 static void check_proper_datallowconn(ClusterInfo *cluster);
 static void check_for_prepared_transactions(ClusterInfo *cluster);
@@ -168,7 +169,10 @@ check_and_dump_old_cluster(bool live_check)
 	 * the old server is running.
 	 */
 	if (!user_opts.check)
+	{
+		remove_logfiles();
 		generate_old_dump();
+	}
 
 	if (!live_check)
 		stop_postmaster(false);
@@ -635,6 +639,24 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name)
 	check_ok();
 }
 
+/*
+ * Remove any per-db logfiles left around by a previous upgrade attempt.
+ * The statically-named logfiles aren't touched, since there are clear
+ * separators added by parseCommandLine().
+ */
+static void
+remove_logfiles(void)
+{
+	int dbnum;
+	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+	{
+		char	log_file_name[MAXPGPATH];
+		DbInfo	*old_db = &old_cluster.dbarr.dbs[dbnum];
+		snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+		if (unlink(log_file_name) < 0 && errno != ENOENT)
+			pg_log(PG_FATAL, "failed to remove pre-existing logfile: %s: %m", log_file_name);
+	}
+}
 
 /*
  *	check_is_install_user()
-- 
2.17.0

Reply via email to