Re: [HACKERS] Further pg_upgrade analysis for many tables

Bruce Momjian Thu, 29 Nov 2012 09:59:54 -0800

On Wed, Nov 28, 2012 at 03:22:32PM -0500, Bruce Momjian wrote:
> On Tue, Nov 27, 2012 at 09:35:10PM -0800, Jeff Janes wrote:
> > > I tested custom format with pg_restore -j and -1, as well as text
> > > restore.  The winner was pg_dump -Fc | pg_restore -1;
> > 
> > I don't have the numbers at hand, but if my relcache patch is
> > accepted, then "-1" stops being faster.
> > 
> > -1 gets rid of the AtOEXAct relcache N^2 behavior, but at the cost of
> > invoking a different N^2, that one in the stats system.
> 
> OK, here are the testing results:
> 
>       #tbls       git     -1    AtOEXAct  both
>           1      11.06   13.06   10.99   13.20
>        1000      21.71   22.92   22.20   22.51
>        2000      32.86   31.09   32.51   31.62
>        4000      55.22   49.96   52.50   49.99
>        8000     105.34   82.10   95.32   82.94
>       16000     223.67  164.27  187.40  159.53
>       32000     543.93  324.63  366.44  317.93
>       64000    1697.14  791.82  767.32  752.57
> 
> Up to 2k, they are all similar.  4k & 8k have the -1 patch as a win, and
> 16k+ really need both patches.
> 
> I will continue working on the -1 patch, and hopefully we can get your
> AtOEXAct patch in soon.  Is someone reviewing that?


I have polished up the patch (attached) and it is ready for application
to 9.3.

Since there is no pg_dump/pg_restore pipe parallelism, I had the old
cluster create per-database dump files, so I don't need to have the old
and new clusters running at the same time, which would have required two
port numbers and make shared memory exhaustion more likely.

We now create a dump file per database, so thousands of database dump
files might cause a performance problem.

This also adds status output so you can see the database names as their
schemas are dumped and restored.  This was requested by users.

I retained custom mode for pg_dump because it is measurably faster than
text mode (not sure why, psql overhead?):

                    git     -Fc     -Fp
            1      11.04   11.08   11.02
         1000      22.37   19.68   21.64
         2000      32.39   28.62   31.40
         4000      56.18   48.53   51.15
         8000     105.15   81.23   91.84
        16000     227.64  156.72  177.79
        32000     542.80  323.19  371.81
        64000    1711.77  789.17  865.03

Text dump files are slightly easier to debug, but probably not by much.

Single-transaction restores were recommended to me over a year ago (by
Magnus?), but I wanted to get pg_upgrade rock-solid before doing
optimization, and now is the right time to optimize.

One risk of single-transaction restores is max_locks_per_transaction
exhaustion, but you will need to increase that on the old cluster for
pg_dump anyway because that is done a single transaction, so the only
new thing is that the new cluster might also need to adjust
max_locks_per_transaction.

I was able to remove split_old_dump() because pg_dumpall now produces a
full global restore file and we do database dumps separately.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 285f10c..bccceb1
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*************** output_check_banner(bool *live_check)
*** 72,78 ****
  
  
  void
! check_old_cluster(bool live_check, char **sequence_script_file_name)
  {
  	/* -- OLD -- */
  
--- 72,78 ----
  
  
  void
! check_and_dump_old_cluster(bool live_check, char **sequence_script_file_name)
  {
  	/* -- OLD -- */
  
*************** check_old_cluster(bool live_check, char
*** 131,140 ****
  	 * the old server is running.
  	 */
  	if (!user_opts.check)
- 	{
  		generate_old_dump();
- 		split_old_dump();
- 	}
  
  	if (!live_check)
  		stop_postmaster(false);
--- 131,137 ----
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
new file mode 100644
index 577ccac..d206e98
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
***************
*** 16,110 ****
  void
  generate_old_dump(void)
  {
! 	/* run new pg_dumpall binary */
! 	prep_status("Creating catalog dump");
  
! 	/*
! 	 * --binary-upgrade records the width of dropped columns in pg_class, and
! 	 * restores the frozenid's for databases and relations.
! 	 */
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  "\"%s/pg_dumpall\" %s --schema-only --binary-upgrade %s -f %s",
  			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
  			  log_opts.verbose ? "--verbose" : "",
! 			  ALL_DUMP_FILE);
! 	check_ok();
! }
! 
! 
! /*
!  *	split_old_dump
!  *
!  *	This function splits pg_dumpall output into global values and
!  *	database creation, and per-db schemas.	This allows us to create
!  *	the support functions between restoring these two parts of the
!  *	dump.  We split on the first "\connect " after a CREATE ROLE
!  *	username match;  this is where the per-db restore starts.
!  *
!  *	We suppress recreation of our own username so we don't generate
!  *	an error during restore
!  */
! void
! split_old_dump(void)
! {
! 	FILE	   *all_dump,
! 			   *globals_dump,
! 			   *db_dump;
! 	FILE	   *current_output;
! 	char		line[LINE_ALLOC];
! 	bool		start_of_line = true;
! 	char		create_role_str[MAX_STRING];
! 	char		create_role_str_quote[MAX_STRING];
! 	char		filename[MAXPGPATH];
! 	bool		suppressed_username = false;
! 
! 
! 	/* 
! 	 * Open all files in binary mode to avoid line end translation on Windows,
! 	 * both for input and output.
! 	 */
! 
! 	snprintf(filename, sizeof(filename), "%s", ALL_DUMP_FILE);
! 	if ((all_dump = fopen(filename, PG_BINARY_R)) == NULL)
! 		pg_log(PG_FATAL, "Could not open dump file \"%s\": %s\n", filename, getErrorText(errno));
! 	snprintf(filename, sizeof(filename), "%s", GLOBALS_DUMP_FILE);
! 	if ((globals_dump = fopen_priv(filename, PG_BINARY_W)) == NULL)
! 		pg_log(PG_FATAL, "Could not write to dump file \"%s\": %s\n", filename, getErrorText(errno));
! 	snprintf(filename, sizeof(filename), "%s", DB_DUMP_FILE);
! 	if ((db_dump = fopen_priv(filename, PG_BINARY_W)) == NULL)
! 		pg_log(PG_FATAL, "Could not write to dump file \"%s\": %s\n", filename, getErrorText(errno));
! 
! 	current_output = globals_dump;
! 
! 	/* patterns used to prevent our own username from being recreated */
! 	snprintf(create_role_str, sizeof(create_role_str),
! 			 "CREATE ROLE %s;", os_info.user);
! 	snprintf(create_role_str_quote, sizeof(create_role_str_quote),
! 			 "CREATE ROLE %s;", quote_identifier(os_info.user));
  
! 	while (fgets(line, sizeof(line), all_dump) != NULL)
  	{
! 		/* switch to db_dump file output? */
! 		if (current_output == globals_dump && start_of_line &&
! 			suppressed_username &&
! 			strncmp(line, "\\connect ", strlen("\\connect ")) == 0)
! 			current_output = db_dump;
  
! 		/* output unless we are recreating our own username */
! 		if (current_output != globals_dump || !start_of_line ||
! 			(strncmp(line, create_role_str, strlen(create_role_str)) != 0 &&
! 			 strncmp(line, create_role_str_quote, strlen(create_role_str_quote)) != 0))
! 			fputs(line, current_output);
! 		else
! 			suppressed_username = true;
  
! 		if (strlen(line) > 0 && line[strlen(line) - 1] == '\n')
! 			start_of_line = true;
! 		else
! 			start_of_line = false;
  	}
  
! 	fclose(all_dump);
! 	fclose(globals_dump);
! 	fclose(db_dump);
  }
--- 16,49 ----
  void
  generate_old_dump(void)
  {
! 	int			dbnum;
  
! 	prep_status("Creating catalog dump\n");
! 
! 	pg_log(PG_REPORT, OVERWRITE_MESSAGE, "global objects");
! 
! 	/* run new pg_dumpall binary for globals */
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  "\"%s/pg_dumpall\" %s --schema-only --globals-only --binary-upgrade %s -f %s",
  			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
  			  log_opts.verbose ? "--verbose" : "",
! 			  GLOBALS_DUMP_FILE);
  
!  	/* create per-db dump files */
! 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
  	{
! 		char 		file_name[MAXPGPATH];
! 		DbInfo     *old_db = &old_cluster.dbarr.dbs[dbnum];
  
! 		pg_log(PG_REPORT, OVERWRITE_MESSAGE, old_db->db_name);
! 		snprintf(file_name, sizeof(file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
  
! 		exec_prog(RESTORE_LOG_FILE, NULL, true,
! 				  "\"%s/pg_dump\" %s --schema-only --binary-upgrade --format=custom %s --file=\"%s\" \"%s\"",
! 				  new_cluster.bindir, cluster_conn_opts(&old_cluster),
! 				  log_opts.verbose ? "--verbose" : "", file_name, old_db->db_name);
  	}
  
! 	end_progress_output();
! 	check_ok();
  }
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
new file mode 100644
index 76247fd..35de541
*** a/contrib/pg_upgrade/exec.c
--- b/contrib/pg_upgrade/exec.c
*************** exec_prog(const char *log_file, const ch
*** 104,111 ****
  
  	if (result != 0)
  	{
! 		report_status(PG_REPORT, "*failure*");
  		fflush(stdout);
  		pg_log(PG_VERBOSE, "There were problems executing \"%s\"\n", cmd);
  		if (opt_log_file)
  			pg_log(throw_error ? PG_FATAL : PG_REPORT,
--- 104,113 ----
  
  	if (result != 0)
  	{
! 		/* we might be in on a progress status line, so go to the next line */
! 		report_status(PG_REPORT, "\n*failure*");
  		fflush(stdout);
+ 
  		pg_log(PG_VERBOSE, "There were problems executing \"%s\"\n", cmd);
  		if (opt_log_file)
  			pg_log(throw_error ? PG_FATAL : PG_REPORT,
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index 4d2e79c..bdc6d13
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*************** main(int argc, char **argv)
*** 92,98 ****
  
  	check_cluster_compatibility(live_check);
  
! 	check_old_cluster(live_check, &sequence_script_file_name);
  
  
  	/* -- NEW -- */
--- 92,98 ----
  
  	check_cluster_compatibility(live_check);
  
! 	check_and_dump_old_cluster(live_check, &sequence_script_file_name);
  
  
  	/* -- NEW -- */
*************** create_new_objects(void)
*** 282,287 ****
--- 282,292 ----
  
  	prep_status("Adding support functions to new cluster");
  
+ 	/*
+ 	 *	Technically, we only need to install these support functions in new
+ 	 *	databases that also exist in the old cluster, but for completeness
+ 	 *	we process all new databases.
+ 	 */
  	for (dbnum = 0; dbnum < new_cluster.dbarr.ndbs; dbnum++)
  	{
  		DbInfo	   *new_db = &new_cluster.dbarr.dbs[dbnum];
*************** create_new_objects(void)
*** 292,302 ****
  	}
  	check_ok();
  
! 	prep_status("Restoring database schema to new cluster");
! 	exec_prog(RESTORE_LOG_FILE, NULL, true,
! 			  "\"%s/psql\" " EXEC_PSQL_ARGS " %s -f \"%s\"",
! 			  new_cluster.bindir, cluster_conn_opts(&new_cluster),
! 			  DB_DUMP_FILE);
  	check_ok();
  
  	/* regenerate now that we have objects in the databases */
--- 297,323 ----
  	}
  	check_ok();
  
! 	prep_status("Restoring database schema to new cluster\n");
! 
! 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
! 	{
! 		char file_name[MAXPGPATH];
! 		DbInfo     *old_db = &old_cluster.dbarr.dbs[dbnum];
! 
! 		pg_log(PG_REPORT, OVERWRITE_MESSAGE, old_db->db_name);
! 		snprintf(file_name, sizeof(file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
! 
! 		/*
! 		 *	Using pg_restore --single-transaction is faster than other
! 		 *	methods, like --jobs.  pg_dump only produces its output at the
! 		 *	end, so there is little parallelism using the pipe.
! 		 */
! 		exec_prog(RESTORE_LOG_FILE, NULL, true,
! 				  "\"%s/pg_restore\" %s --exit-on-error --single-transaction --verbose --dbname \"%s\" \"%s\"",
! 				  new_cluster.bindir, cluster_conn_opts(&new_cluster),
! 				  old_db->db_name, file_name);
! 	}
! 	end_progress_output();
  	check_ok();
  
  	/* regenerate now that we have objects in the databases */
*************** cleanup(void)
*** 455,468 ****
  	/* Remove dump and log files? */
  	if (!log_opts.retain)
  	{
  		char	  **filename;
  
  		for (filename = output_files; *filename != NULL; filename++)
  			unlink(*filename);
  
! 		/* remove SQL files */
! 		unlink(ALL_DUMP_FILE);
  		unlink(GLOBALS_DUMP_FILE);
! 		unlink(DB_DUMP_FILE);
  	}
  }
--- 476,498 ----
  	/* Remove dump and log files? */
  	if (!log_opts.retain)
  	{
+ 		int			dbnum;
  		char	  **filename;
  
  		for (filename = output_files; *filename != NULL; filename++)
  			unlink(*filename);
  
! 		/* remove dump files */
  		unlink(GLOBALS_DUMP_FILE);
! 
! 		if (old_cluster.dbarr.dbs)
! 			for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
! 			{
! 				char file_name[MAXPGPATH];
! 				DbInfo     *old_db = &old_cluster.dbarr.dbs[dbnum];
! 
! 				snprintf(file_name, sizeof(file_name), DB_DUMP_FILE_MASK, old_db->db_oid);
! 				unlink(file_name);
! 			}
  	}
  }
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index ace56e5..d981035
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
***************
*** 29,38 ****
  #define OVERWRITE_MESSAGE	"  %-" MESSAGE_WIDTH "." MESSAGE_WIDTH "s\r"
  #define GET_MAJOR_VERSION(v)	((v) / 100)
  
- #define ALL_DUMP_FILE		"pg_upgrade_dump_all.sql"
  /* contains both global db information and CREATE DATABASE commands */
  #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
! #define DB_DUMP_FILE		"pg_upgrade_dump_db.sql"
  
  #define SERVER_LOG_FILE		"pg_upgrade_server.log"
  #define RESTORE_LOG_FILE	"pg_upgrade_restore.log"
--- 29,37 ----
  #define OVERWRITE_MESSAGE	"  %-" MESSAGE_WIDTH "." MESSAGE_WIDTH "s\r"
  #define GET_MAJOR_VERSION(v)	((v) / 100)
  
  /* contains both global db information and CREATE DATABASE commands */
  #define GLOBALS_DUMP_FILE	"pg_upgrade_dump_globals.sql"
! #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
  
  #define SERVER_LOG_FILE		"pg_upgrade_server.log"
  #define RESTORE_LOG_FILE	"pg_upgrade_restore.log"
*************** extern OSInfo os_info;
*** 296,307 ****
  /* check.c */
  
  void		output_check_banner(bool *live_check);
! void check_old_cluster(bool live_check,
  				  char **sequence_script_file_name);
  void		check_new_cluster(void);
  void		report_clusters_compatible(void);
  void		issue_warnings(char *sequence_script_file_name);
! void output_completion_banner(char *analyze_script_file_name,
  						 char *deletion_script_file_name);
  void		check_cluster_versions(void);
  void		check_cluster_compatibility(bool live_check);
--- 295,306 ----
  /* check.c */
  
  void		output_check_banner(bool *live_check);
! void		check_and_dump_old_cluster(bool live_check,
  				  char **sequence_script_file_name);
  void		check_new_cluster(void);
  void		report_clusters_compatible(void);
  void		issue_warnings(char *sequence_script_file_name);
! void		output_completion_banner(char *analyze_script_file_name,
  						 char *deletion_script_file_name);
  void		check_cluster_versions(void);
  void		check_cluster_compatibility(bool live_check);
*************** void		disable_old_cluster(void);
*** 319,325 ****
  /* dump.c */
  
  void		generate_old_dump(void);
- void		split_old_dump(void);
  
  
  /* exec.c */
--- 318,323 ----
*************** __attribute__((format(PG_PRINTF_ATTRIBUT
*** 433,438 ****
--- 431,437 ----
  void
  pg_log(eLogType type, char *fmt,...)
  __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+ void		end_progress_output(void);
  void
  prep_status(const char *fmt,...)
  __attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2)));
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index 7dbaac9..14e66df
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
*************** transfer_all_new_dbs(DbInfoArr *old_db_a
*** 82,90 ****
  		}
  	}
  
! 	prep_status(" ");			/* in case nothing printed; pass a space so
! 								 * gcc doesn't complain about empty format
! 								 * string */
  	check_ok();
  
  	return msg;
--- 82,88 ----
  		}
  	}
  
! 	end_progress_output();
  	check_ok();
  
  	return msg;
diff --git a/contrib/pg_upgrade/util.c b/contrib/pg_upgrade/util.c
new file mode 100644
index 1d4bc89..0c1eccc
*** a/contrib/pg_upgrade/util.c
--- b/contrib/pg_upgrade/util.c
*************** report_status(eLogType type, const char
*** 35,40 ****
--- 35,52 ----
  }
  
  
+ /* force blank output for progress display */
+ void
+ end_progress_output(void)
+ {
+ 	/*
+ 	 *	In case nothing printed; pass a space so gcc doesn't complain about
+ 	 *	empty format string.
+ 	 */
+ 	prep_status(" ");
+ }
+ 
+ 
  /*
   * prep_status
   *
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
new file mode 100644
index ca95bad..83eae81
*** a/src/bin/pg_dump/pg_dumpall.c
--- b/src/bin/pg_dump/pg_dumpall.c
*************** main(int argc, char *argv[])
*** 502,508 ****
  		}
  
  		/* Dump CREATE DATABASE commands */
! 		if (!globals_only && !roles_only && !tablespaces_only)
  			dumpCreateDB(conn);
  
  		/* Dump role/database settings */
--- 502,508 ----
  		}
  
  		/* Dump CREATE DATABASE commands */
! 		if (binary_upgrade || (!globals_only && !roles_only && !tablespaces_only))
  			dumpCreateDB(conn);
  
  		/* Dump role/database settings */
*************** dumpRoles(PGconn *conn)
*** 745,753 ****
  		 * will acquire the right properties even if it already exists (ie, it
  		 * won't hurt for the CREATE to fail).  This is particularly important
  		 * for the role we are connected as, since even with --clean we will
! 		 * have failed to drop it.
  		 */
! 		appendPQExpBuffer(buf, "CREATE ROLE %s;\n", fmtId(rolename));
  		appendPQExpBuffer(buf, "ALTER ROLE %s WITH", fmtId(rolename));
  
  		if (strcmp(PQgetvalue(res, i, i_rolsuper), "t") == 0)
--- 745,755 ----
  		 * will acquire the right properties even if it already exists (ie, it
  		 * won't hurt for the CREATE to fail).  This is particularly important
  		 * for the role we are connected as, since even with --clean we will
! 		 * have failed to drop it.  binary_upgrade cannot generate any errors,
! 		 * so we assume the role is already created.
  		 */
! 		if (!binary_upgrade)
! 			appendPQExpBuffer(buf, "CREATE ROLE %s;\n", fmtId(rolename));
  		appendPQExpBuffer(buf, "ALTER ROLE %s WITH", fmtId(rolename));
  
  		if (strcmp(PQgetvalue(res, i, i_rolsuper), "t") == 0)

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further pg_upgrade analysis for many tables

Reply via email to