I have seen this numerous times but had not dug into it, until now. If pg_upgrade fails and is re-run, it appends to its logfiles, which is confusing since, if it fails again, it then looks like the original error recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.
I think it should either truncate the logfiles, or error early if any of the files exist. Or it could put all its output files into a newly-created subdirectory. Or this message could be output to the per-db logfiles, and not just the static ones: | "pg_upgrade run on %s". For the per-db logfiels with OIDs in their name, changing open() from "append" mode to truncate mode doesn't work, since they're written to in parallel. They have to be removed/truncated in advance. This is one possible fix. You can test its effect by deliberately breaking one of the calls to exec_progs(), like this. - "\"%s/pg_restore\" %s %s --exit-on-error --verbose " + "\"%s/pg_restore\" %s %s --exit-on-error --verboose "
>From 1b1a86df1febce5f55257c625b81e1499f6bfce3 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Sat, 11 Dec 2021 19:19:50 -0600 Subject: [PATCH] pg_upgrade: fail if logfiles exist.. Rather than appending to them, which is confusing since old errors are still present. --- src/bin/pg_upgrade/check.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 1106912863..92060e5abe 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -18,6 +18,7 @@ static void check_new_cluster_is_empty(void); static void check_databases_are_compatible(void); static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb); static bool equivalent_locale(int category, const char *loca, const char *locb); +static void remove_logfiles(void); static void check_is_install_user(ClusterInfo *cluster); static void check_proper_datallowconn(ClusterInfo *cluster); static void check_for_prepared_transactions(ClusterInfo *cluster); @@ -168,7 +169,10 @@ check_and_dump_old_cluster(bool live_check) * the old server is running. */ if (!user_opts.check) + { + remove_logfiles(); generate_old_dump(); + } if (!live_check) stop_postmaster(false); @@ -635,6 +639,24 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name) check_ok(); } +/* + * Remove any per-db logfiles left around by a previous upgrade attempt. + * The statically-named logfiles aren't touched, since there are clear + * separators added by parseCommandLine(). + */ +static void +remove_logfiles(void) +{ + int dbnum; + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) + { + char log_file_name[MAXPGPATH]; + DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; + snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid); + if (unlink(log_file_name) < 0 && errno != ENOENT) + pg_log(PG_FATAL, "failed to remove pre-existing logfile: %s: %m", log_file_name); + } +} /* * check_is_install_user() -- 2.17.0