Re: [GENERAL] Server move using rsync

2011-11-02 Thread Robert Treat
On Tue, Nov 1, 2011 at 11:08 AM, Alan Hodgson ahodg...@simkin.ca wrote:
 On October 31, 2011 03:01:19 PM Stephen Denne wrote:
 I'm wondering whether it's worth doing anyway, simply to check that it
 doesn't do something completely unexpected, which would presumably alert
 us to something we hadn't considered.


 Testing is always worthwhile, if only to ensure that PostgreSQL will actually
 run with your configuration on the new machine (sufficient shared memory, IP
 addresses specified in postgresql.conf, etc).

 However, assuming the PostgreSQL binary packages you're using are identical,
 and assuming that you aren't changing tablespace pointers around, the rsync /
 restart is pretty fool-proof in terms of reliably copying PostgreSQL itself.
 PostgreSQL is good about updating time stamps on modified files, you don't 
 have
 to worry about needing the full compare options on rsync or anything  -avr --
 delete is generally sufficient .

 You might disable WAL archiving during a test startup to avoid sending
 duplicates to your backup server.


You know, this looks like it will work, but if I were you, I would set
up the database as a PITR standby on the new box, and have WAL
shipping in place. When you're ready to move, you shutdown the old
database, synch up the xlogs, and then failover to the new database.
Not only should this be faster, it seems less error prone, and you can
actually test the failover and lunch bits while the original server is
up and running.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server move using rsync

2011-11-02 Thread Venkat Balaji

 We're not doing this long-term, in order to have a backup server we can
 fail-over to, but rather as a one-off low impact move of our database.
 Consequently, instead of using pg_start_backup and pg_stop_backup, and
 keeping all WAL, we're stopping the database, rsync of everything, and
 starting the database in the new server, with it appearing to the new
 server (if it was capable of noticing such things) that it had simply been
 shutdown and restarted.


This is fine. If the database is shutdown, then the backup is completely
safe. You can bring up the cluster as on backup time without any issues.


 The initial and repeated rsyncs while the first server is running and in
 use, are solely in order to reduce the time that the rsync takes while the
 postgresql application is stopped.



 Do you still think we need to do anything special with pg_start_backup,
 pg_stop_backup, and WAL archives?


Yes, after the initial sync, if the next repeated rsyncs are performed
while the database cluster is up and running, then
pg_start_backup()-rsync-pg_stop_backup() (as said earlier) must be
performed. This will help Postgres know that the backup is going on. When
you do pg_start_backup(), Postgres will make note and updates all the base
file headers and makes a note of the TXN ids and Checkpoint time by
creating a label. So, the WAL archives at time are needed for recovery (to
recover any half written transactions).

Without doing pg_start_backup, and with rsync not performing a snapshot
 backup, my assumption is that until we do an rsync with the service
 shutdown, whatever we've got at the location we're copying to, is not
 self-consistent.


Above explanation should answer this.


 If we start up postgresql on it, won't it think it is recovering from a
 sudden crash? I think it may either appear to recover ok, or complain about
 various things, and not start up ok, with neither option providing us with
 much insight, as all that could tell us is that either some disk blocks are
 consistent, or some are not, which is our starting assumption anyway.


Starting up postgresql would probably result in more disk block changes
 that will result in more work next time we rsync.


This is normal behavior of rsync. It all depends on how volatile is your
system and volume of changes performed.


 How badly can we screw things up, given we intend to perform a final rsync
 with no postgresql services running? What should we try and avoid doing,
 and why?


 We might simply compare some hashes between the two systems, of some files
 that haven't had their last-modified dates changed since the last rsync.


All this will be taken care by Postgres with the help of WAL archive files
generated at the time when you performed rsync with postgres services up
and running.

Thanks
VB


Re: [GENERAL] Server move using rsync

2011-11-01 Thread Alan Hodgson
On October 31, 2011 03:01:19 PM Stephen Denne wrote:
 I'm wondering whether it's worth doing anyway, simply to check that it
 doesn't do something completely unexpected, which would presumably alert
 us to something we hadn't considered.
 

Testing is always worthwhile, if only to ensure that PostgreSQL will actually 
run with your configuration on the new machine (sufficient shared memory, IP 
addresses specified in postgresql.conf, etc). 

However, assuming the PostgreSQL binary packages you're using are identical, 
and assuming that you aren't changing tablespace pointers around, the rsync / 
restart is pretty fool-proof in terms of reliably copying PostgreSQL itself. 
PostgreSQL is good about updating time stamps on modified files, you don't have 
to worry about needing the full compare options on rsync or anything  -avr --
delete is generally sufficient .

You might disable WAL archiving during a test startup to avoid sending 
duplicates to your backup server.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server move using rsync

2011-10-31 Thread Stephen Denne
Thanks for sharing your experience and thoughts Venkat,

Venkat Balaji said:

 We are performing backups to our production server exactly the same way. We 
 have been through some problems while restoring and bringing up the database. 
 If you are planning to take initial complete rsync with subsequent 
 incremental rsyncs, then you need to make sure that you have all the WAL 
 archives starting from the initial rsync on Day 1. 
 
 Also are you doing the following?
 
 1. pg_start_backup() - rsync - pg_stop_backup() ?
 2. Please let us know your WAL Archive backup strategy.


We're not doing this long-term, in order to have a backup server we can 
fail-over to, but rather as a one-off low impact move of our database. 
Consequently, instead of using pg_start_backup and pg_stop_backup, and keeping 
all WAL, we're stopping the database, rsync of everything, and starting the 
database in the new server, with it appearing to the new server (if it was 
capable of noticing such things) that it had simply been shutdown and restarted.

The initial and repeated rsyncs while the first server is running and in use, 
are solely in order to reduce the time that the rsync takes while the 
postgresql application is stopped.

Do you still think we need to do anything special with pg_start_backup, 
pg_stop_backup, and WAL archives?

 Is there any way during that week, that we can verify whether our partially 
 completed database move process is going to result in a  database that 
 starts up ok?
 
 In general, yes, database can start up normally. Without WAL Archives, 
 recovering to a particular time would not be possible.

Without doing pg_start_backup, and with rsync not performing a snapshot 
backup, my assumption is that until we do an rsync with the service shutdown, 
whatever we've got at the location we're copying to, is not self-consistent.

If we start up postgresql on it, won't it think it is recovering from a sudden 
crash? I think it may either appear to recover ok, or complain about various 
things, and not start up ok, with neither option providing us with much 
insight, as all that could tell us is that either some disk blocks are 
consistent, or some are not, which is our starting assumption anyway.

Starting up postgresql would probably result in more disk block changes that 
will result in more work next time we rsync.

I'm wondering whether it's worth doing anyway, simply to check that it doesn't 
do something completely unexpected, which would presumably alert us to 
something we hadn't considered.

How badly can we screw things up, given we intend to perform a final rsync with 
no postgresql services running? What should we try and avoid doing, and why?

We might simply compare some hashes between the two systems, of some files that 
haven't had their last-modified dates changed since the last rsync.

Regards,
Stephen.
This email with any attachments is confidential and may be subject to legal 
privilege. If it is not intended for you please advise by replying immediately, 
destroy it and do not copy, disclose or use it in any way.

Please consider the environment before printing this e-mail
__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server move using rsync

2011-10-28 Thread Venkat Balaji

 Another option is to use rsync to perform a file system backup. This is
 done by first running rsync while the database server is running, then
 shutting down the database server just long enough to do a second rsync. The
 second rsync will be much quicker than the first, because it has relatively
 little data to transfer, and the end result will be consistent because the
 server was down. This method allows a file system backup to be performed
 with minimal downtime.

 Except that we plan on an initial rsync which we think might take a couple
 of days, then subsequent daily rsyncs for up to a week to keep it up to date
 till we stop the old database, rsync again, and start the new database.


We are performing backups to our production server exactly the same way. We
have been through some problems while restoring and bringing up the
database. If you are planning to take initial complete rsync with subsequent
incremental rsyncs, then you need to make sure that you have all the WAL
archives starting from the initial rsync on Day 1.

Also are you doing the following?

1. pg_start_backup() - rsync - pg_stop_backup() ?
2. Please let us know your WAL Archive backup strategy.

Is there any way during that week, that we can verify whether our partially
 completed database move process is going to result in a database that starts
 up ok?


In general, yes, database can start up normally. Without WAL Archives,
recovering to a particular time would not be possible.

Thanks
VB


[GENERAL] Server move using rsync

2011-10-27 Thread Stephen Denne
We're intending to move a 470GB PostgreSQL 8.3.13 database using the following 
technique from http://www.postgresql.org/docs/8.3/interactive/backup-file.html
 
Another option is to use rsync to perform a file system backup. This is done 
by first running rsync while the database server is running, then shutting down 
the database server just long enough to do a second rsync. The second rsync 
will be much quicker than the first, because it has relatively little data to 
transfer, and the end result will be consistent because the server was down. 
This method allows a file system backup to be performed with minimal downtime.

Except that we plan on an initial rsync which we think might take a couple of 
days, then subsequent daily rsyncs for up to a week to keep it up to date till 
we stop the old database, rsync again, and start the new database.

A very rough approximation of our database would be half a dozen large tables 
taking up 1/3 of the disk space, and lots of indexes on those tables taking the 
other 2/3 of the space.

If we assume usage characteristics of:
Much less than 1% of indexed data changing per day, with almost all of those 
updates being within the 1% of most recently added data.
Much less than 1% of historical indexed data being deleted per day with most of 
the deletions expected to affect sets of contiguous file pages.
About 1% of new indexed data added per day

I'm curious of the impact of vacuum (automatic and manual) during that process 
on expected amount of work rsync will have to do, and time it will take, and on 
what the update pattern is on files of Btree indexes.

Is it worth making sure vacuum is not run, in order to reduce the amount of 
files that change during that period?

Do a number of additions evenly spread through the domain of an indexed field's 
values result in localized changes to the indexes files, or changes throughout 
the files?

How about for additions to the end of the domain of an indexed field's values 
(e.g. adding current dates)?

Is there any way during that week, that we can verify whether our partially 
completed database move process is going to result in a database that starts up 
ok?

Regards, Stephen Denne.
This email with any attachments is confidential and may be subject to legal 
privilege. If it is not intended for you please advise by replying immediately, 
destroy it and do not copy, disclose or use it in any way.

Please consider the environment before printing this e-mail
__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server move using rsync

2011-10-27 Thread Scott Marlowe
On Thu, Oct 27, 2011 at 7:37 PM, Stephen Denne
stephen.de...@datam.co.nz wrote:
 We're intending to move a 470GB PostgreSQL 8.3.13 database using the 
 following technique from 
 http://www.postgresql.org/docs/8.3/interactive/backup-file.html

 Another option is to use rsync to perform a file system backup. This is done 
 by first running rsync while the database server is running, then shutting 
 down the database server just long enough to do a second rsync. The second 
 rsync will be much quicker than the first, because it has relatively little 
 data to transfer, and the end result will be consistent because the server 
 was down. This method allows a file system backup to be performed with 
 minimal downtime.

 Except that we plan on an initial rsync which we think might take a couple of 
 days, then subsequent daily rsyncs for up to a week to keep it up to date 
 till we stop the old database, rsync again, and start the new database.

Sounds reasonable.  don't forget the --delete switch or the
destination will just keep growing and growing.

 A very rough approximation of our database would be half a dozen large tables 
 taking up 1/3 of the disk space, and lots of indexes on those tables taking 
 the other 2/3 of the space.

 If we assume usage characteristics of:
 Much less than 1% of indexed data changing per day, with almost all of those 
 updates being within the 1% of most recently added data.
 Much less than 1% of historical indexed data being deleted per day with most 
 of the deletions expected to affect sets of contiguous file pages.
 About 1% of new indexed data added per day

 I'm curious of the impact of vacuum (automatic and manual) during that 
 process on expected amount of work rsync will have to do, and time it will 
 take, and on what the update pattern is on files of Btree indexes.

 Is it worth making sure vacuum is not run, in order to reduce the amount of 
 files that change during that period?

Probably not.  You can test that theory by turning off vacuum for a
day to see how much of a change it makes.   My semi-educated
scientific wild-assed guess is it won't make any difference, since the
file / block will be changed with or without the vacuum, and still
have to be copied.,

 Is there any way during that week, that we can verify whether our partially 
 completed database move process is going to result in a database that starts 
 up ok?

Try starting it up?

In general, the lower the traffic when you rsync the better the
chances, but honestly if you're not stopping the database then you
shouldn't count on luck to make it work.  Note that you CAN do the
whole rsync followed by setting up PITR to get a coherent database
backup that is guaranteed to start up, assuming you've followed all
the instructions on how to set up PITR properly.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general