On Thu, Nov 29, 2018 at 5:40 AM Stephen Frost <sfr...@snowman.net> wrote: > > Greetings, > > * Michael Paquier (mich...@paquier.xyz) wrote: > > On Wed, Nov 28, 2018 at 11:00:31AM +0000, PG Doc comments form wrote: > > > For the archive command: > > > <=128 There are not errors in the PostgreSQL log (messages with severity > > > equal or higher than ERROR). Firstly 3 messages of type LOG about fault, > > > then WARNING about this and pause for 1 minute, then repeated. > > > >=129 FATAL error in the PostgeSQL log. The message about stoping an > > > >archive > > > process, but not the database. Repeated after roughly 16 seconds. > > > > This code is around for some time, and comes from this commit: > > commit: 3ad0728c817bf8abd2c76bd11d856967509b307c > > author: Tom Lane <t...@sss.pgh.pa.us> > > date: Tue, 21 Nov 2006 20:59:53 +0000 > > committer: Tom Lane <t...@sss.pgh.pa.us> > > date: Tue, 21 Nov 2006 20:59:53 +0000 > > On systems that have setsid(2) (which should be just about everything except > > Windows), arrange for each postmaster child process to be its own process > > group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole > > process group not only the direct child process. This provides saner > > behavior > > for archive and recovery scripts; in particular, it's possible to shut down > > a > > warm-standby recovery server using "pg_ctl stop -m immediate", since > > delivery > > of SIGQUIT to the startup subprocess will result in killing the waiting > > recovery_command. Also, this makes Query Cancel and statement_timeout apply > > to scripts being run from backends via system(). (There is no support in > > the > > core backend for that, but it's widely done using untrusted PLs.) Per gripe > > from Stephen Harris and subsequent discussion. > > > > The relevant part if pgarch_archiveXlog() in pgarch.c, and this part > > is most relevant: > > * Per the Single Unix Spec, shells report exit status > 128 when a > > * called command died on a signal. > > > > > In this case PostgreSQL tries confirm rules for return codes of a unix > > > shell. A unix shell return 126 in the case of "command not executable", > > > 127 > > > in the case "command not found", 128+# of signal in the case if > > > application > > > interrupted by uncatched signal. > > > > If you were to rewrite those paragraphs or make them more precise, how > > would you actually shape your suggestions? I personally quite like the > > current formulations, but I am rather used to it to be honest. > > This is another example, at least imv, of why we really need to move > away from archive_command as an interface for doing WAL archiving.
+1 > > Having discussed this quite a bit lately with David Steele and Magnus, > it's pretty clear that we need to completely rip out how this works > today and rewrite it based around an extension model where a background > worker can start up and essentially take the place of the archiver > process, with flexibility to jump forward through the WAL stream, > communicate clearly with other processes, handle failure to do so > gracefully based on the specific cases, etc. > > We could then possibly write an extension to be included that mimics > what archive_command does today, but imv we should immediately consider > it deprecated and encourage people to move off of it. > > Thanks! > > Stephen -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company