On Jul 16, 2009, at 11:09 AM, Greg Stark wrote:

On Thu, Jul 16, 2009 at 4:41 PM, Heikki
Linnakangas<heikki.linnakan...@enterprisedb.com> wrote:
Rick Gigger wrote:
If you use an rsync like algorithm for doing the base backups wouldn't
that increase the size of the database for which it would still be
practical to just re-sync?  Couldn't you in fact sync a very large
database if the amount of actual change in the files was a small
percentage of the total size?

It would certainly help to reduce the network traffic, though you'd
still have to scan all the data to see what has changed.

The fundamental problem with pushing users to start over with a new
base backup is that there's no relationship between the size of the
WAL and the size of the database.

You can plausibly have a system with extremely high transaction rate
generating WAL very quickly, but where the whole database fits in a
few hundred megabytes. In that case you could be behind by only a few
minutes and have it be faster to take a new base backup.

Or you could have a petabyte database which is rarely updated. In
which case it might be faster to apply weeks' worth of logs than to
try to take a base backup.

Only the sysadmin is actually going to know which makes more sense.
Unless we start tieing WAL parameters to the database size or
something like that.

Once again wouldn't an rsync like algorithm help here. Couldn't you have the default be to just create a new base backup for them , but then allow you to specify an existing base backup if you've already got one?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to