Josh Berkus <j...@agliodbs.com> wrote: > Currently, if archive_command is failing, pg_stop_backup() will hang > forever. The only way to figure out what's wrong with pg_stop_backup() > is to tail the PostgreSQL logs. This is difficult for users to > troubleshoot, and strongly resists any kind of automation.
That is bad. > Yes, we can work around this by setting statement_timeout, but that has > two issues (a) the user has to remember to do it before the problem > occurs, and (b) it won't differentiate between archive failure and other > reasons it might time out. Clearly not a long-term solution. > As such, I propose that pg_stop_backup() should error with an > appropriate error message ("Could not archive WAL segments") after > three > archiving attempts. We could also add an optional parameter to raise > the number of attempts from the default of three. That sounds sane to me. > An alternative, if we were doing this from scratch, would be for > pg_stop_backup to return false or -1 or something if it couldn't > archive; there are reasons why a user might not care that > archive_command was failing (shared storage comes to mind). However, > that would be a surprising break with backwards compatability, since > currently users don't check the result value of pg_stop_backup(). Some might, which is a stronger argument against changing what get returned. Even in a green field though, I would argue that pg_stop_backup() should return information about the minimum range of WAL files needed to perform a consistent recovery -- or possibly duplicate everything in the backup history file. An error seems much more appropriate to indicate that the user does not have a valid backup. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers