Re: [HACKERS] PITR logging control program

Bruce Momjian Thu, 29 Apr 2004 20:18:43 -0700

Simon Riggs wrote:
> > Agreed we want to allow the superuser control over writing of the
> > archive logs.  The question is how do they get access to that.  Is it by
> > running a client program continuously or calling an interface script
> > from the backend?
> > 
> > My point was that having the backend call the program has improved
> > reliablity and control over when to write, and easier administration.
> > 
> 
> Agreed. We've both suggested ways that can occur, though I suggest this
> is much less of a priority, for now. Not "no", just not "now".
> 
> > Another case is server start/stop.  You want to start/stop the archive
> > logger to match the database server, particularly if you reboot the
> > server.  I know Informix used a client program for logging, and it was a
> > pain to administer.
> > 
> 
> pg_arch is just icing on top of the API. The API is the real deal here.
> I'm not bothered if pg_arch is not accepted, as long as we can adopt the
> API. As noted previously, my original mind was to split the API away
> from the pg_arch application to make it clearer what was what. Once that
> has been done, I encourage others to improve pg_arch - but also to use
> the API to interface with other BAR prodiucts.
> 
> If you're using PostgreSQL for serious business then you will be using a
> serious BAR product as well. There are many FOSS alternatives...
> 
> The API's purpose is to allow larger, pre-existing BAR products to know
> when and how to retrieve data from PostgreSQL. Those products don't and
> won't run underneath postmaster, so although I agree with Peter's
> original train of thought, I also agree with Tom's suggestion that we
> need an API more than we need an archiver process. 
> 
> I would be happy with an exteral program if it was started/stoped by the
> > postmaster (or via GUC change) and received a signal when a WAL file was
> > written.  
> 
> That is exactly what has been written.
> 
> The PostgreSQL side of the API is written directly into the backend, in
> xlog.c and is therefore activated by postmaster controlled code. That
> then sends "a signal" to the process that will do the archiving - the
> Archiver side of the XLogArchive API has it as an in-process library.
> (The "signal" is, in fact, a zero-length file written to disk because
> there are many reasons why an external archiver may not be ready to
> archive or even up and running to receive a signal).
> 
> The only difference is that there is some confusion as to the role and
> importance of pg_arch.


OK, I have finalized my thinking on this.

We both agree that a pg_arch client-side program certainly works for
PITR logging.  The big question in my mind is whether a client-side
program is what we want to use long-term, and whether we want to release
a 7.5 that uses it and then change it in 7.6 to something more
integrated into the backend.

Let me add this is a little different from pg_autovacuum.  With that,
you could put it in cron and be done with it.  With pg_arch, there is a
routine that has to be used to do PITR, and if we change the process in
7.6, I am afraid there will be confusion.

Let me also add that I am not terribly worried about having the feature
to restore to an arbitrary point in time for 7.5.  I would much rather
have a good PITR solution that works cleanly in 7.5 and add it to 7.6,
than to have retore to an arbitrary point but have a strained
implementation that we have to revisit for 7.6.

Here are my ideas.  (I talked to Tom about this and am including his
ideas too.)  Basically, the archiver that scans the xlog directory to
identify files to be archived should be a subprocess of the postmaster. 
You already have that code and it can be moved into the backend.

Here is my implementation idea.  First, your pg_arch code runs in the
backend and is started just like the statistics process.  It has to be
started whether PITR is being used or not, but will be inactive if PITR
isn't enabled.  This must be done because we can't have a backend start
this process later in case they turn on PITR after server start.

The process id of the archive process is stored in shared memory.  When
PITR is turned on, each backend that complete a WAL file sends a signal
to the archiver process.  The archiver wakes up on the signal and scans
the directory, finds files that need archiving, and either does a 'cp'
or runs a user-defined program (like scp) to transfer the file to the
archive location.

In GUC we add:

        pitr = true/false
        pitr_location = 'directory, [EMAIL PROTECTED]:/dir, etc'
        pitr_transfer = 'cp, scp, etc'

The archiver program updates its config values when someone changes
these values via postgresql.conf (and uses pg_ctl reload).  These can
only be modified from postgresql.conf.  Changing them via SET has to be
disabled because they are cluster-level settings, not per session, like
port number or checkpoint_segments.

Basically, I think that we need to push user-level control of this
process down beyond the directory scanning code (that is pretty
standard), and allow them to call an arbitrary program to transfer the
logs.  My idea is that the pitr_transfer program will get $1=WAL file
name and $2=pitr_location and the program can use those arguments to do
the transfer.  We can even put a pitr_transfer.sample program in share
and document $1 and $2.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] PITR logging control program

Reply via email to