Simon Riggs wrote: > > Agreed we want to allow the superuser control over writing of the > > archive logs. The question is how do they get access to that. Is it by > > running a client program continuously or calling an interface script > > from the backend? > > > > My point was that having the backend call the program has improved > > reliablity and control over when to write, and easier administration. > > > > Agreed. We've both suggested ways that can occur, though I suggest this > is much less of a priority, for now. Not "no", just not "now". > > > Another case is server start/stop. You want to start/stop the archive > > logger to match the database server, particularly if you reboot the > > server. I know Informix used a client program for logging, and it was a > > pain to administer. > > > > pg_arch is just icing on top of the API. The API is the real deal here. > I'm not bothered if pg_arch is not accepted, as long as we can adopt the > API. As noted previously, my original mind was to split the API away > from the pg_arch application to make it clearer what was what. Once that > has been done, I encourage others to improve pg_arch - but also to use > the API to interface with other BAR prodiucts. > > If you're using PostgreSQL for serious business then you will be using a > serious BAR product as well. There are many FOSS alternatives... > > The API's purpose is to allow larger, pre-existing BAR products to know > when and how to retrieve data from PostgreSQL. Those products don't and > won't run underneath postmaster, so although I agree with Peter's > original train of thought, I also agree with Tom's suggestion that we > need an API more than we need an archiver process. > > I would be happy with an exteral program if it was started/stoped by the > > postmaster (or via GUC change) and received a signal when a WAL file was > > written. > > That is exactly what has been written. > > The PostgreSQL side of the API is written directly into the backend, in > xlog.c and is therefore activated by postmaster controlled code. That > then sends "a signal" to the process that will do the archiving - the > Archiver side of the XLogArchive API has it as an in-process library. > (The "signal" is, in fact, a zero-length file written to disk because > there are many reasons why an external archiver may not be ready to > archive or even up and running to receive a signal). > > The only difference is that there is some confusion as to the role and > importance of pg_arch.
OK, I have finalized my thinking on this. We both agree that a pg_arch client-side program certainly works for PITR logging. The big question in my mind is whether a client-side program is what we want to use long-term, and whether we want to release a 7.5 that uses it and then change it in 7.6 to something more integrated into the backend. Let me add this is a little different from pg_autovacuum. With that, you could put it in cron and be done with it. With pg_arch, there is a routine that has to be used to do PITR, and if we change the process in 7.6, I am afraid there will be confusion. Let me also add that I am not terribly worried about having the feature to restore to an arbitrary point in time for 7.5. I would much rather have a good PITR solution that works cleanly in 7.5 and add it to 7.6, than to have retore to an arbitrary point but have a strained implementation that we have to revisit for 7.6. Here are my ideas. (I talked to Tom about this and am including his ideas too.) Basically, the archiver that scans the xlog directory to identify files to be archived should be a subprocess of the postmaster. You already have that code and it can be moved into the backend. Here is my implementation idea. First, your pg_arch code runs in the backend and is started just like the statistics process. It has to be started whether PITR is being used or not, but will be inactive if PITR isn't enabled. This must be done because we can't have a backend start this process later in case they turn on PITR after server start. The process id of the archive process is stored in shared memory. When PITR is turned on, each backend that complete a WAL file sends a signal to the archiver process. The archiver wakes up on the signal and scans the directory, finds files that need archiving, and either does a 'cp' or runs a user-defined program (like scp) to transfer the file to the archive location. In GUC we add: pitr = true/false pitr_location = 'directory, [EMAIL PROTECTED]:/dir, etc' pitr_transfer = 'cp, scp, etc' The archiver program updates its config values when someone changes these values via postgresql.conf (and uses pg_ctl reload). These can only be modified from postgresql.conf. Changing them via SET has to be disabled because they are cluster-level settings, not per session, like port number or checkpoint_segments. Basically, I think that we need to push user-level control of this process down beyond the directory scanning code (that is pretty standard), and allow them to call an arbitrary program to transfer the logs. My idea is that the pitr_transfer program will get $1=WAL file name and $2=pitr_location and the program can use those arguments to do the transfer. We can even put a pitr_transfer.sample program in share and document $1 and $2. -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html