[ On Thursday, June 20, 2002 at 22:17:55 (+0200), Ragnar Kjørstad wrote: ]
> Subject: Re: Backing up PostgreSQL?
>
> On Thu, Jun 20, 2002 at 01:11:09PM -0400, Greg A. Woods wrote:
> > [ On Wednesday, June 19, 2002 at 23:53:14 (+0200), Ragnar Kjørstad wrote: ]
> > > Subject: Re: Backing up PostgreSQL?
> > >
> > > By this definition postgresql is consistant at all times
> > 
> > That's simply not possible to be true.  PostgreSQL uses multiple files
> > in the filesystem namespace to contain its data -- sometimes even
> > multiple files per table.  It is literally impossible, with the POSIX
> > file I/O interfaces, to guarantee that concurrent writes to multiple
> > files will all complete at the same time.  Remember I've only been
> > talking about the backed up files being in a self-consistent state and
> > not requiring roll-back or roll-forward of any transaction logs after
> > restore.
> 
> Puh - we've been through this already! Postgresql doesn't need this
> guarantee, because it writes to it's log to avoid this very problem!

You haven't been paying attention to what I've been saying:

   Remember I've only been talking about the backed up files being in a
   self-consistent state and not requiring roll-back or roll-forward of
   any transaction logs after restore.

In order to have consistent (from a database and application
perspective) files on a filesystem backup of the database files, you
MUST -- ABSOLUTELY MUST -- find some mechanism to synchronise between
the database and the filesystem for the duration of the time it takes to
make the filesystem backp, whether thats the few seconds it takes to
create a "snapshot", or the many minutes it takes to make a verbatim
backup using 'dump' or whatever.  If all contingencies are to be covered
the only sure way to do this is to shut down the database engine and
ensure it has closed all of its files.  There must be no writes to the
filesystem by the database process(es) during the time the backup or
snapshot being made.  None.  Not to the database files, nor to the
transaction log.  None whatsoever.

With great care and a separate backup of the database transaction log
taken after the filesystem backup it may be possible to re-build
database consistency after a restore, but I wouldn't ever want to risk
having to do that in a disaster recovery scenario.  I would either want
a guaranteed self-consistent filesystem copy of the database files, or a
properly co-ordinated pg_dump of the database contents (and preferrably
I want both, and both representing the exact same state, though here
there's more leeway for using a transaction log to record db state
changes between one form of backup and the other :-)

In the end if your DBA and/or developers have not taken into account the
need to shut down the database for at least a short time on a regular
basis in order to obtain good backups then you may have more serious
problems on your hands (and you should find a new and more experienced
DBA too! ;-).  The literature is full of ways of making secure backups
of transactions -- but such techniques need to be considered in the
design of your systems.  For example it's not unheard of in financial
applications to run the transaction log direct to a hardware-mirrored
tape drive, pausing the database engine (but not stopping it) only long
enough to change tapes, and immediately couriering one tape to a secure
off-site location.  Full backups with the database shut down are then
taken only on much longer intervals and disaster recovery involves
replaying all the transactions since the last full backup.  There are
also tricks you can play with RAID-1 which are much faster and
potentially safer than OS-level filesystem snapshots (and of course such
tricks don't require snapshot support, which is still relatively rare
and unproven in many of the systems where it is available).  These
tricks allow you to get away with shutting down the database only so
long as it takes to swap the mirror disk from one set to another, at
which point you can make a leisurely backup of the quiescent side of the
mirror.  Then the RAID hardware can do the reconstruction of the mirror
while the database remains live.

> Here is the close-code on linux:

As far as I know the Linux kernel does not have support for anything
quite exactly resembling the Unix filesystem.  Your analysis of the code
you quoted would seem to confirm that too.  :-)

> Just to make sure there is no (additional) confusion here; what I'm
> saying is:
> 1. Meta-data must be updated properly. This is obvious and 
>    shouldn't require futher explanation...
> 2. non-journaling filesystems (e.g. ext2 on linux) do update
>    the inode-metadata on fsync(), but they do not update the
>    directory. 

The Unix Fast File System is not a log/journaling filesystem.  However
it does not suffer the problems you're worried about.

Wasn't this question originally about FreeBSD anyway?

-- 
                                                                Greg A. Woods

+1 416 218-0098;  <[EMAIL PROTECTED]>;  <[EMAIL PROTECTED]>;  <[EMAIL PROTECTED]>
Planix, Inc. <[EMAIL PROTECTED]>; VE3TCP; Secrets of the Weird <[EMAIL PROTECTED]>

Reply via email to