On 1/22/15 7:54 PM, Stephen Frost wrote:
* Bruce Momjian (br...@momjian.us) wrote:
>On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> >Or do you - as the text edited in your patch, but not the quote above -
> >mean to run pg_upgrade just on the primary and then rsync?
>
>No, I was going to run it on both, then rsync.
I'm pretty sure this is all a lot easier than you believe it to be.  If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links.  rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ.  You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed 
pg_upgrade so that it created hardlinks that had the same timestamp as the 
original file?

That said, the whole timestamp race condition in rsync gives me the 
heebie-jeebies. For normal workloads maybe it's not that big a deal, but when 
dealing with fixed-size data (ie: Postgres blocks)? Eww.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? 
Could we have it create a shell script instead of directly modifying things itself? Or 
perhaps some custom "command file" that could then be replayed by pg_upgrade on 
another server? Of course, that's assuming that replicas are compatible enough with 
masters for that to work...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to