Bruce, * Bruce Momjian (br...@momjian.us) wrote: > I have applied the attached patch to show examples of using rsync on > PGDATA and tablespaces, documented that rsync is only useful when in > link mode, and explained more clearly how rsync handles links. You can > see the results here: > > http://momjian.us/pgsql_docs/pgupgrade.html > > Any more improvements?
First off, I'd strongly suggest that we make "Step 1" in the pg_upgrade process be "take a full backup and verify that you're able to restore it successfully and without corruption." I don't particularly care for how this seems to imply that the Rsync method is "the" method to use when --link mode is used with pg_upgrade. I'd reword the section title to be along these lines: If you have streaming replicas or log-shipping standby servers then they will also need to be updated. The simplest way to accomplish this is to simply rebuild the replicas from scratch once the primary is back online. Unfortunately, that can take a long time for larger systems as the data has to be copied from the primary to each replica in the environment. If --link mode was used with pg_upgrade, the Latest checkpoint location matches between the primary and the replica(s) (as discussed in Step 8), the rsync utility is available, and the existing data directory and new data directory on the replica are able to be in a common directory on the same filesystem (as is required on the primary for --link mode to be used), then an alternative method may be used to update the replicas using rsync which will generally require much less time. Note that this method will end up needlessly copying across temporary files and unlogged tables. If these make up a large portion of your database, then rebuilding the replicas from scratch may be a better option. With this method, you will not be running pg_upgrade on the standby servers, but rather rsync on the primary to sync the replicas to match the results of the pg_upgrade on the primary. Do not start any servers yet. If you did not use link mode, skip the instructions in this section and simply recreate the standby servers. This method requires that the *old* data directory on the replica be in place as rsync will be creating a hard-link tree between the old data files on the replica and the new data directory on the replica (as was done by pg_upgrade on the primary). a. Install the new PostgreSQL binaries on standby servers. ... b. Make sure the new standby data directories do not exist If initdb was run on the replica to create a new data directory, remove that new data directory (the rsync will recreate it). Do *not* remove the existing old data directory. c. Install custom shared object files ** I would probably move this up to be step 'b' instead, and make step 'b' be step 'c' instead. d. Stop standby servers ... *new* e. Verify/re-verify that Latest checkpoint location in pg_controldata on the replica matches that of the primary (from before the primary was upgraded with pg_upgrade). f. Save configuration files ** this should have a caveat that it's only necessary if the config files are in the data directory. g. Run rsync ** I am having a hard time figuring out why --delete makes sense here. There shouldn't be anything in the new data directory, and we don't actually need to delete anything in the old data directory on the replica, so what are we doing suggesting --delete be used? Strikes me as unnecessairly adding risk, should someone end up doing the wrong command. Also, again, if I was doing this, I'd absolutely run rsync with --dry-run for starters and review what it is going to do and make sure that's consistent with what I'd expect. Thanks! Stephen
signature.asc
Description: Digital signature