Re: [HACKERS] Online base backup from the hot-standby

Fujii Masao Mon, 26 Sep 2011 22:52:12 -0700

On Tue, Sep 27, 2011 at 11:56 AM, Fujii Masao <[email protected]> wrote:
>> In backup.sgml  the new section titled "Making a Base Backup during
>> Recovery"  I would prefer to see some mention in the title that this
>> procedure is for standby servers ie "Making a Base Backup from a Standby
>> Database".  Users who have setup a hot-standby database should be familiar
>> with the 'standby' terminology. I agree that the "during recovery"
>> description is technically correct but I'm not sure someone who is looking
>> through the manual for instructions on making a base backup from here
>> standby will realize this is the section they should read.
>
> I used the term "recovery" rather than "standby" because we can take
> a backup even from the server in normal archive recovery mode but not
> standby mode. But there is not many users who take a backup during
> normal archive recovery, so I agree that the term "standby" is better to
> be used in the document. Will change.


Done.

>> Around line 969 where you give an example of copying the control file I
>> would be a bit clearer that this is an example command.  Ie (Copy the
>> pg_control file from the cluster directory to the global sub-directory of
>> the backup.  For example "cp $PGDATA/global/pg_control
>> /mnt/server/backupdir/global")
>
> Looks better. Will change.

Done.

>> or give an error message on
>> pg_stop_backup() saying that the base backup won't be usable.  The above
>> error doesn't really tell the user why there is a mismatch.
>
> What about the following error message?
>
> ERROR:  pg_stop_backup() was executed during normal processing though
> pg_start_backup() was executed during recovery
> HINT:  The database backup will not be usable.

Done. I attached the new version of the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 935,940 **** SELECT pg_stop_backup();
--- 935,1000 ----
     </para>
    </sect2>
  
+   <sect2 id="backup-from-standby">
+    <title>Making a Base Backup from Standby Database</title>
+ 
+    <para>
+     It's possible to make a base backup during recovery. Which allows a user
+     to take a base backup from the standby to offload the expense of
+     periodic backups from the master. Its procedure is similar to that
+     during normal running. All these steps must be performed on the standby.
+   <orderedlist>
+    <listitem>
+     <para>
+      Ensure that hot standby is enabled (see <xref linkend="hot-standby">
+      for more information).
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Connect to the database as a superuser and execute <function>pg_start_backup</>.
+      This performs a restartpoint if there is at least one checkpoint record
+      replayed since last restartpoint.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Perform a file system backup.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Copy the pg_control file from the cluster directory to the global
+      sub-directory of the backup. For example:
+ <programlisting>
+ cp $PGDATA/global/pg_control /mnt/server/backupdir/global
+ </programlisting>
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Again connect to the database as a superuser, and execute
+      <function>pg_stop_backup</>. This terminates the backup mode, but does not
+      perform a switch to the next WAL segment, create a backup history file and
+      wait for all required WAL segments to be archived,
+      unlike that during normal processing.
+     </para>
+    </listitem>
+   </orderedlist>
+    </para>
+ 
+    <para>
+     You cannot use the <application>pg_basebackup</> tool to take the backup
+     from the standby.
+    </para>
+    <para>
+     It's not possible to make a base backup from the server in recovery mode
+     when reading WAL written during a period when <varname>full_page_writes</>
+     was disabled. If you want to take a base backup from the standby,
+     <varname>full_page_writes</> must be set to true on the master.
+    </para>
+   </sect2>
+ 
    <sect2 id="backup-pitr-recovery">
     <title>Recovering Using a Continuous Archive Backup</title>
  
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 1680,1685 **** SET ENABLE_SEQSCAN TO OFF;
--- 1680,1693 ----
         </para>
  
         <para>
+         WAL written while <varname>full_page_writes</> is disabled does not
+         contain enough information to make a base backup during recovery
+         (see <xref linkend="backup-from-standby">),
+         so <varname>full_page_writes</> must be enabled on the master
+         to take a backup from the standby.
+        </para>
+ 
+        <para>
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
          The default is <literal>on</>.
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 14014,14020 **** SELECT set_config('log_statement_stats', 'off', false);
     <para>
      The functions shown in <xref
      linkend="functions-admin-backup-table"> assist in making on-line backups.
!     These functions cannot be executed during recovery.
     </para>
  
     <table id="functions-admin-backup-table">
--- 14014,14021 ----
     <para>
      The functions shown in <xref
      linkend="functions-admin-backup-table"> assist in making on-line backups.
!     These functions except <function>pg_start_backup</> and <function>pg_stop_backup</>
!     cannot be executed during recovery.
     </para>
  
     <table id="functions-admin-backup-table">
***************
*** 14094,14100 **** SELECT set_config('log_statement_stats', 'off', false);
      database cluster's data directory, performs a checkpoint,
      and then returns the backup's starting transaction log location as text.
      The user can ignore this result value, but it is
!     provided in case it is useful.
  <programlisting>
  postgres=# select pg_start_backup('label_goes_here');
   pg_start_backup
--- 14095,14103 ----
      database cluster's data directory, performs a checkpoint,
      and then returns the backup's starting transaction log location as text.
      The user can ignore this result value, but it is
!     provided in case it is useful. If <function>pg_start_backup</> is
!     executed during recovery, it performs a restartpoint rather than
!     writing a new checkpoint.
  <programlisting>
  postgres=# select pg_start_backup('label_goes_here');
   pg_start_backup
***************
*** 14122,14127 **** postgres=# select pg_start_backup('label_goes_here');
--- 14125,14137 ----
     </para>
  
     <para>
+     If <function>pg_stop_backup</> is executed during recovery, it just
+     removes the label file, but doesn't create a backup history file and wait for
+     the ending transaction log file to be archived. The return value is equal to
+     or bigger than the exact backup's ending transaction log location.
+    </para>
+ 
+    <para>
      <function>pg_switch_xlog</> moves to the next transaction log file, allowing the
      current file to be archived (assuming you are using continuous archiving).
      The return value is the ending transaction log location + 1 within the just-completed transaction log file.
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 664,670 **** static void xlog_outrec(StringInfo buf, XLogRecord *record);
  #endif
  static void pg_start_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 				  bool *backupEndRequired);
  static void rm_redo_error_callback(void *arg);
  static int	get_sync_bit(int method);
  
--- 664,670 ----
  #endif
  static void pg_start_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 				  bool *backupEndRequired, bool *backupDuringRecovery);
  static void rm_redo_error_callback(void *arg);
  static int	get_sync_bit(int method);
  
***************
*** 6023,6028 **** StartupXLOG(void)
--- 6023,6030 ----
  	uint32		freespace;
  	TransactionId oldestActiveXID;
  	bool		backupEndRequired = false;
+ 	bool		backupDuringRecovery = false;
+ 	DBState	save_state;
  
  	/*
  	 * Read control file and check XLOG status looks valid.
***************
*** 6156,6162 **** StartupXLOG(void)
  	if (StandbyMode)
  		OwnLatch(&XLogCtl->recoveryWakeupLatch);
  
! 	if (read_backup_label(&checkPointLoc, &backupEndRequired))
  	{
  		/*
  		 * When a backup_label file is present, we want to roll forward from
--- 6158,6165 ----
  	if (StandbyMode)
  		OwnLatch(&XLogCtl->recoveryWakeupLatch);
  
! 	if (read_backup_label(&checkPointLoc, &backupEndRequired,
! 						  &backupDuringRecovery))
  	{
  		/*
  		 * When a backup_label file is present, we want to roll forward from
***************
*** 6312,6317 **** StartupXLOG(void)
--- 6315,6321 ----
  		 * pg_control with any minimum recovery stop point obtained from a
  		 * backup history file.
  		 */
+ 		save_state = ControlFile->state;
  		if (InArchiveRecovery)
  			ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
  		else
***************
*** 6332,6343 **** StartupXLOG(void)
  		}
  
  		/*
! 		 * set backupStartPoint if we're starting recovery from a base backup
  		 */
  		if (haveBackupLabel)
  		{
  			ControlFile->backupStartPoint = checkPoint.redo;
  			ControlFile->backupEndRequired = backupEndRequired;
  		}
  		ControlFile->time = (pg_time_t) time(NULL);
  		/* No need to hold ControlFileLock yet, we aren't up far enough */
--- 6336,6369 ----
  		}
  
  		/*
! 		 * Set backupStartPoint if we're starting recovery from a base backup.
! 		 *
! 		 * Set backupEndPoint if we're starting recovery from a base backup
! 		 * which was taken from the server in recovery mode. We confirm
! 		 * that minRecoveryPoint can be used as the backup end location by
! 		 * checking whether the database system status in pg_control indicates
! 		 * DB_IN_ARCHIVE_RECOVERY. If minRecoveryPoint is not available,
! 		 * there is no way to know the backup end location, so we cannot
! 		 * advance recovery any more. In this case, we have to cancel recovery
! 		 * before changing the database system status in pg_control to
! 		 * DB_IN_ARCHIVE_RECOVERY because otherwise subsequent
! 		 * restarted recovery would go through this check wrongly.
  		 */
  		if (haveBackupLabel)
  		{
  			ControlFile->backupStartPoint = checkPoint.redo;
  			ControlFile->backupEndRequired = backupEndRequired;
+ 
+ 			if (backupDuringRecovery)
+ 			{
+ 				if (save_state != DB_IN_ARCHIVE_RECOVERY)
+ 					ereport(FATAL,
+ 							(errmsg("database system status mismatches between "
+ 									"pg_control and backup_label"),
+ 							 errhint("This means that the backup is corrupted and you will "
+ 									 "have to use another backup for recovery.")));
+ 				ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
+ 			}
  		}
  		ControlFile->time = (pg_time_t) time(NULL);
  		/* No need to hold ControlFileLock yet, we aren't up far enough */
***************
*** 6617,6622 **** StartupXLOG(void)
--- 6643,6670 ----
  				/* Pop the error context stack */
  				error_context_stack = errcontext.previous;
  
+ 				if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
+ 					XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
+ 				{
+ 					/*
+ 					 * We have reached the end of base backup, the point where
+ 					 * the minimum recovery point in pg_control which was
+ 					 * backed up just before pg_stop_backup() indicates.
+ 					 * The data on disk is now consistent. Reset backupStartPoint
+ 					 * and backupEndPoint.
+ 					 */
+ 					elog(DEBUG1, "end of backup reached");
+ 
+ 					LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ 
+ 					MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr));
+ 					MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr));
+ 					ControlFile->backupEndRequired = false;
+ 					UpdateControlFile();
+ 
+ 					LWLockRelease(ControlFileLock);
+ 				}
+ 
  				/*
  				 * Update shared recoveryLastRecPtr after this record has been
  				 * replayed.
***************
*** 8417,8423 **** xlog_redo(XLogRecPtr lsn, XLogRecord *record)
  		 * never arrive.
  		 */
  		if (InArchiveRecovery &&
! 			!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
  			ereport(ERROR,
  					(errmsg("online backup was canceled, recovery cannot continue")));
  
--- 8465,8472 ----
  		 * never arrive.
  		 */
  		if (InArchiveRecovery &&
! 			!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
! 			XLogRecPtrIsInvalid(ControlFile->backupEndPoint))
  			ereport(ERROR,
  					(errmsg("online backup was canceled, recovery cannot continue")));
  
***************
*** 8880,8885 **** XLogRecPtr
--- 8929,8935 ----
  do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  {
  	bool		exclusive = (labelfile == NULL);
+ 	bool		recovery_in_progress = false;
  	XLogRecPtr	checkpointloc;
  	XLogRecPtr	startpoint;
  	pg_time_t	stamp_time;
***************
*** 8891,8908 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  	FILE	   *fp;
  	StringInfoData labelfbuf;
  
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
  
! 	if (RecoveryInProgress())
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("recovery is in progress"),
! 				 errhint("WAL control functions cannot be executed during recovery.")));
! 
! 	if (!XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
--- 8941,8960 ----
  	FILE	   *fp;
  	StringInfoData labelfbuf;
  
+ 	recovery_in_progress = RecoveryInProgress();
+ 
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
  
! 	/*
! 	 * During recovery, we don't need to check WAL level. Because the fact that
! 	 * we are now executing pg_start_backup() during recovery means that
! 	 * wal_level is set to hot_standby on the master, i.e., WAL level is sufficient
! 	 * for making an online backup.
! 	 */
! 	if (!recovery_in_progress && !XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
***************
*** 8924,8931 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  	 * we won't have a history file covering the old timeline if pg_xlog
  	 * directory was not included in the base backup and the WAL archive was
  	 * cleared too before starting the backup.
  	 */
! 	RequestXLogSwitch();
  
  	/*
  	 * Mark backup active in shared memory.  We must do full-page WAL writes
--- 8976,8988 ----
  	 * we won't have a history file covering the old timeline if pg_xlog
  	 * directory was not included in the base backup and the WAL archive was
  	 * cleared too before starting the backup.
+ 	 *
+ 	 * During recovery, we skip forcing XLOG file switch, which means that
+ 	 * the backup taken during recovery is not available for the special recovery
+ 	 * case described above.
  	 */
! 	if (!recovery_in_progress)
! 		RequestXLogSwitch();
  
  	/*
  	 * Mark backup active in shared memory.  We must do full-page WAL writes
***************
*** 8941,8946 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 8998,9006 ----
  	 * since we expect that any pages not modified during the backup interval
  	 * must have been correctly captured by the backup.)
  	 *
+ 	 * Note that forcePageWrites has no effect during an online backup from
+ 	 * the server in recovery mode.
+ 	 *
  	 * We must hold WALInsertLock to change the value of forcePageWrites, to
  	 * ensure adequate interlocking against XLogInsert().
  	 */
***************
*** 8970,8980 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  		do
  		{
  			/*
! 			 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
  			 * page problems, this guarantees that two successive backup runs
  			 * will have different checkpoint positions and hence different
  			 * history file names, even if nothing happened in between.
  			 *
  			 * We use CHECKPOINT_IMMEDIATE only if requested by user (via
  			 * passing fast = true).  Otherwise this can take awhile.
  			 */
--- 9030,9048 ----
  		do
  		{
  			/*
! 			 * Force a CHECKPOINT.  Aside from being necessary to prevent torn
  			 * page problems, this guarantees that two successive backup runs
  			 * will have different checkpoint positions and hence different
  			 * history file names, even if nothing happened in between.
  			 *
+ 			 * During recovery, establish a restartpoint if possible. We use the last
+ 			 * restartpoint as the backup starting checkpoint. This means that two
+ 			 * successive backup runs can have same checkpoint positions.
+ 			 *
+ 			 * Since the fact that we are executing pg_start_backup() during
+ 			 * recovery means that bgwriter is running, we can use
+ 			 * RequestCheckpoint() to establish a restartpoint.
+ 			 *
  			 * We use CHECKPOINT_IMMEDIATE only if requested by user (via
  			 * passing fast = true).  Otherwise this can take awhile.
  			 */
***************
*** 9002,9007 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9070,9081 ----
  			 * for each backup instead of forcing another checkpoint, but
  			 * taking a checkpoint right after another is not that expensive
  			 * either because only few buffers have been dirtied yet.
+ 			 *
+ 			 * During recovery, since we don't use the end-of-backup WAL
+ 			 * record and don't write the backup history file, the starting WAL
+ 			 * location doesn't need to be unique. This means that two base
+ 			 * backups started at the same time might use the same checkpoint
+ 			 * as starting locations.
  			 */
  			LWLockAcquire(WALInsertLock, LW_SHARED);
  			if (XLByteLT(XLogCtl->Insert.lastBackupStart, startpoint))
***************
*** 9010,9015 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9084,9091 ----
  				gotUniqueStartpoint = true;
  			}
  			LWLockRelease(WALInsertLock);
+ 			if (recovery_in_progress)
+ 				gotUniqueStartpoint = true;
  		} while (!gotUniqueStartpoint);
  
  		XLByteToSeg(startpoint, _logId, _logSeg);
***************
*** 9031,9036 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9107,9114 ----
  						 checkpointloc.xlogid, checkpointloc.xrecoff);
  		appendStringInfo(&labelfbuf, "BACKUP METHOD: %s\n",
  						 exclusive ? "pg_start_backup" : "streamed");
+ 		appendStringInfo(&labelfbuf, "SYSTEM STATUS: %s\n",
+ 						 recovery_in_progress ? "recovery" : "in production");
  		appendStringInfo(&labelfbuf, "START TIME: %s\n", strfbuf);
  		appendStringInfo(&labelfbuf, "LABEL: %s\n", backupidstr);
  
***************
*** 9123,9128 **** pg_start_backup_callback(int code, Datum arg)
--- 9201,9208 ----
   * history file at the beginning of archive recovery, but we now use the WAL
   * record for that and the file is for informational and debug purposes only.
   *
+  * During recovery, we only remove the backup label file.
+  *
   * Note: different from CancelBackup which just cancels online backup mode.
   */
  Datum
***************
*** 9149,9154 **** XLogRecPtr
--- 9229,9235 ----
  do_pg_stop_backup(char *labelfile, bool waitforarchive)
  {
  	bool		exclusive = (labelfile == NULL);
+ 	bool		recovery_in_progress = false;
  	XLogRecPtr	startpoint;
  	XLogRecPtr	stoppoint;
  	XLogRecData rdata;
***************
*** 9159,9164 **** do_pg_stop_backup(char *labelfile, bool waitforarchive)
--- 9240,9246 ----
  	char		stopxlogfilename[MAXFNAMELEN];
  	char		lastxlogfilename[MAXFNAMELEN];
  	char		histfilename[MAXFNAMELEN];
+ 	char		systemstatus[20];
  	uint32		_logId;
  	uint32		_logSeg;
  	FILE	   *lfp;
***************
*** 9168,9186 **** do_pg_stop_backup(char *labelfile, bool waitforarchive)
  	int			waits = 0;
  	bool		reported_waiting = false;
  	char	   *remaining;
  
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup"))));
  
! 	if (RecoveryInProgress())
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("recovery is in progress"),
! 				 errhint("WAL control functions cannot be executed during recovery.")));
! 
! 	if (!XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
--- 9250,9271 ----
  	int			waits = 0;
  	bool		reported_waiting = false;
  	char	   *remaining;
+ 	char	   *ptr;
+ 
+ 	recovery_in_progress = RecoveryInProgress();
  
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup"))));
  
! 	/*
! 	 * During recovery, we don't need to check WAL level. Because the fact that
! 	 * we are now executing pg_stop_backup() means that wal_level is set to
! 	 * hot_standby on the master, i.e., WAL level is sufficient for making an online
! 	 * backup.
! 	 */
! 	if (!recovery_in_progress && !XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
***************
*** 9271,9276 **** do_pg_stop_backup(char *labelfile, bool waitforarchive)
--- 9356,9414 ----
  	remaining = strchr(labelfile, '\n') + 1;	/* %n is not portable enough */
  
  	/*
+ 	 * Parse the SYSTEM STATUS line, and check that database system
+ 	 * status matches between pg_start_backup() and pg_stop_backup().
+ 	 */
+ 	ptr = strstr(remaining, "SYSTEM STATUS:");
+ 	if (sscanf(ptr, "SYSTEM STATUS: %19s\n", systemstatus) != 1)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
+ 	if (strcmp(systemstatus, "recovery") == 0 && !recovery_in_progress)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("pg_stop_backup() was executed during normal processing "
+ 						"though pg_start_backup() was executed during recovery"),
+ 				 errhint("The database backup will not be usable.")));
+ 
+ 	/*
+ 	 * During recovery, we don't write an end-of-backup record. We can
+ 	 * assume that pg_control was backed up just before pg_stop_backup()
+ 	 * and its minimum recovery point can be available as the backup end
+ 	 * location. Without an end-of-backup record, we can check correctly
+ 	 * whether we've reached the end of backup when starting recovery
+ 	 * from this backup.
+ 	 *
+ 	 * We don't force a switch to new WAL file and wait for all the required
+ 	 * files to be archived. This is okay if we use the backup to start
+ 	 * the standby. But, if it's for an archive recovery, to ensure all the
+ 	 * required files are available, a user should wait for them to be archived,
+ 	 * or include them into the backup after pg_stop_backup().
+ 	 *
+ 	 * We return the current minimum recovery point as the backup end
+ 	 * location. Note that it's would be bigger than the exact backup end
+ 	 * location if the minimum recovery point is updated since the backup
+ 	 * of pg_control. The return value of pg_stop_backup() is often used
+ 	 * for a user to calculate the required files. Returning approximate
+ 	 * location is harmless for that use because it's guaranteed not to be
+ 	 * smaller than the exact backup end location.
+ 	 *
+ 	 * XXX currently a backup history file is for informational and debug
+ 	 * purposes only. It's not essential for an online backup. Furthermore,
+ 	 * even if it's created, it will not be archived during recovery because
+ 	 * an archiver is not invoked. So it doesn't seem worthwhile to write
+ 	 * a backup history file during recovery.
+ 	 */
+ 	if (recovery_in_progress)
+ 	{
+ 		LWLockAcquire(ControlFileLock, LW_SHARED);
+ 		stoppoint = ControlFile->minRecoveryPoint;
+ 		LWLockRelease(ControlFileLock);
+ 
+ 		return stoppoint;
+ 	}
+ 
+ 	/*
  	 * Write the backup-end xlog record
  	 */
  	rdata.data = (char *) (&startpoint);
***************
*** 9787,9804 **** pg_xlogfile_name(PG_FUNCTION_ARGS)
   * Returns TRUE if a backup_label was found (and fills the checkpoint
   * location and its REDO location into *checkPointLoc and RedoStartLSN,
   * respectively); returns FALSE if not. If this backup_label came from a
!  * streamed backup, *backupEndRequired is set to TRUE.
   */
  static bool
! read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired)
  {
  	char		startxlogfilename[MAXFNAMELEN];
  	TimeLineID	tli;
  	FILE	   *lfp;
  	char		ch;
  	char		backuptype[20];
  
  	*backupEndRequired = false;
  
  	/*
  	 * See if label file is present
--- 9925,9946 ----
   * Returns TRUE if a backup_label was found (and fills the checkpoint
   * location and its REDO location into *checkPointLoc and RedoStartLSN,
   * respectively); returns FALSE if not. If this backup_label came from a
!  * streamed backup, *backupEndRequired is set to TRUE. If this backup_label
!  * was created during recovery, *backupDuringRecovery is set to TRUE.
   */
  static bool
! read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
! 				  bool *backupDuringRecovery)
  {
  	char		startxlogfilename[MAXFNAMELEN];
  	TimeLineID	tli;
  	FILE	   *lfp;
  	char		ch;
  	char		backuptype[20];
+ 	char		systemstatus[20];
  
  	*backupEndRequired = false;
+ 	*backupDuringRecovery = false;
  
  	/*
  	 * See if label file is present
***************
*** 9832,9847 **** read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired)
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  				 errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
  	/*
! 	 * BACKUP METHOD line is new in 9.1. We can't restore from an older backup
! 	 * anyway, but since the information on it is not strictly required, don't
! 	 * error out if it's missing for some reason.
  	 */
! 	if (fscanf(lfp, "BACKUP METHOD: %19s", backuptype) == 1)
  	{
  		if (strcmp(backuptype, "streamed") == 0)
  			*backupEndRequired = true;
  	}
  
  	if (ferror(lfp) || FreeFile(lfp))
  		ereport(FATAL,
  				(errcode_for_file_access(),
--- 9974,9995 ----
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  				 errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
  	/*
! 	 * BACKUP METHOD and SYSTEM STATUS lines are new in 9.2. We can't
! 	 * restore from an older backup anyway, but since the information on it
! 	 * is not strictly required, don't error out if it's missing for some reason.
  	 */
! 	if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1)
  	{
  		if (strcmp(backuptype, "streamed") == 0)
  			*backupEndRequired = true;
  	}
  
+ 	if (fscanf(lfp, "SYSTEM STATUS: %19s\n", systemstatus) == 1)
+ 	{
+ 		if (strcmp(systemstatus, "recovery") == 0)
+ 			*backupDuringRecovery = true;
+ 	}
+ 
  	if (ferror(lfp) || FreeFile(lfp))
  		ereport(FATAL,
  				(errcode_for_file_access(),
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 287,292 **** typedef enum
--- 287,294 ----
  static PMState pmState = PM_INIT;
  
  static bool ReachedNormalRunning = false;		/* T if we've reached PM_RUN */
+ static bool OnlineBackupAllowed = false;		/* T if we've reached PM_RUN or
+ 												 * PM_HOT_STANDBY */
  
  bool		ClientAuthInProgress = false;		/* T during new-client
  												 * authentication */
***************
*** 2105,2122 **** pmdie(SIGNAL_ARGS)
  				/* and the walwriter too */
  				if (WalWriterPID != 0)
  					signal_child(WalWriterPID, SIGTERM);
! 
! 				/*
! 				 * If we're in recovery, we can't kill the startup process
! 				 * right away, because at present doing so does not release
! 				 * its locks.  We might want to change this in a future
! 				 * release.  For the time being, the PM_WAIT_READONLY state
! 				 * indicates that we're waiting for the regular (read only)
! 				 * backends to die off; once they do, we'll kill the startup
! 				 * and walreceiver processes.
! 				 */
! 				pmState = (pmState == PM_RUN) ?
! 					PM_WAIT_BACKUP : PM_WAIT_READONLY;
  			}
  
  			/*
--- 2107,2113 ----
  				/* and the walwriter too */
  				if (WalWriterPID != 0)
  					signal_child(WalWriterPID, SIGTERM);
! 				pmState = PM_WAIT_BACKUP;
  			}
  
  			/*
***************
*** 2299,2304 **** reaper(SIGNAL_ARGS)
--- 2290,2296 ----
  			 */
  			FatalError = false;
  			ReachedNormalRunning = true;
+ 			OnlineBackupAllowed = true;
  			pmState = PM_RUN;
  
  			/*
***************
*** 2823,2831 **** PostmasterStateMachine(void)
  	{
  		/*
  		 * PM_WAIT_BACKUP state ends when online backup mode is not active.
  		 */
  		if (!BackupInProgress())
! 			pmState = PM_WAIT_BACKENDS;
  	}
  
  	if (pmState == PM_WAIT_READONLY)
--- 2815,2831 ----
  	{
  		/*
  		 * PM_WAIT_BACKUP state ends when online backup mode is not active.
+ 		 *
+ 		 * If we're in recovery, we can't kill the startup process right away,
+ 		 * because at present doing so does not release its locks.  We might
+ 		 * want to change this in a future release.  For the time being,
+ 		 * the PM_WAIT_READONLY state indicates that we're waiting for
+ 		 * the regular (read only) backends to die off; once they do,
+ 		 * we'll kill the startup and walreceiver processes.
  		 */
  		if (!BackupInProgress())
! 			pmState = ReachedNormalRunning ?
! 				PM_WAIT_BACKENDS : PM_WAIT_READONLY;
  	}
  
  	if (pmState == PM_WAIT_READONLY)
***************
*** 2994,3006 **** PostmasterStateMachine(void)
  			/*
  			 * Terminate backup mode to avoid recovery after a clean fast
  			 * shutdown.  Since a backup can only be taken during normal
! 			 * running (and not, for example, while running under Hot Standby)
! 			 * it only makes sense to do this if we reached normal running. If
! 			 * we're still in recovery, the backup file is one we're
! 			 * recovering *from*, and we must keep it around so that recovery
! 			 * restarts from the right place.
  			 */
! 			if (ReachedNormalRunning)
  				CancelBackup();
  
  			/* Normal exit from the postmaster is here */
--- 2994,3006 ----
  			/*
  			 * Terminate backup mode to avoid recovery after a clean fast
  			 * shutdown.  Since a backup can only be taken during normal
! 			 * running and hot standby, it only makes sense to do this
! 			 * if we reached normal running or hot standby. If we have not
! 			 * reached a consistent recovery state yet, the backup file is
! 			 * one we're recovering *from*, and we must keep it around
! 			 * so that recovery restarts from the right place.
  			 */
! 			if (OnlineBackupAllowed)
  				CancelBackup();
  
  			/* Normal exit from the postmaster is here */
***************
*** 4157,4162 **** sigusr1_handler(SIGNAL_ARGS)
--- 4157,4163 ----
  		ereport(LOG,
  		(errmsg("database system is ready to accept read only connections")));
  
+ 		OnlineBackupAllowed = true;
  		pmState = PM_HOT_STANDBY;
  	}
  
*** a/src/bin/pg_controldata/pg_controldata.c
--- b/src/bin/pg_controldata/pg_controldata.c
***************
*** 232,237 **** main(int argc, char *argv[])
--- 232,240 ----
  	printf(_("Backup start location:                %X/%X\n"),
  		   ControlFile.backupStartPoint.xlogid,
  		   ControlFile.backupStartPoint.xrecoff);
+ 	printf(_("Backup end location:                  %X/%X\n"),
+ 		   ControlFile.backupEndPoint.xlogid,
+ 		   ControlFile.backupEndPoint.xrecoff);
  	printf(_("End-of-backup record required:        %s\n"),
  		   ControlFile.backupEndRequired ? _("yes") : _("no"));
  	printf(_("Current wal_level setting:            %s\n"),
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
***************
*** 883,897 **** do_stop(void)
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present, we're recovering from an online
! 		 * backup instead of performing one.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0 &&
! 			stat(recovery_file, &statbuf) != 0)
  		{
! 			print_msg(_("WARNING: online backup mode is active\n"
! 						"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
--- 883,900 ----
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present and new connection has not been
! 		 * allowed yet, an online backup mode must not be active.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0)
  		{
! 			if (stat(recovery_file, &statbuf) != 0)
! 				print_msg(_("WARNING: online backup mode is active\n"
! 							"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
! 			else
! 				print_msg(_("WARNING: online backup mode is active if you can connect as a superuser to server\n"
! 							"If so, shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
***************
*** 971,985 **** do_restart(void)
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present, we're recovering from an online
! 		 * backup instead of performing one.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0 &&
! 			stat(recovery_file, &statbuf) != 0)
  		{
! 			print_msg(_("WARNING: online backup mode is active\n"
! 						"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
--- 974,991 ----
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present and new connection has not been
! 		 * allowed yet, an online backup mode must not be active.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0)
  		{
! 			if (stat(recovery_file, &statbuf) != 0)
! 				print_msg(_("WARNING: online backup mode is active\n"
! 							"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
! 			else
! 				print_msg(_("WARNING: online backup mode is active if you can connect as a superuser to server\n"
! 							"If so, shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
*** a/src/bin/pg_resetxlog/pg_resetxlog.c
--- b/src/bin/pg_resetxlog/pg_resetxlog.c
***************
*** 503,509 **** GuessControlValues(void)
  	ControlFile.time = (pg_time_t) time(NULL);
  	ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
  
! 	/* minRecoveryPoint and backupStartPoint can be left zero */
  
  	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
  	ControlFile.MaxConnections = 100;
--- 503,509 ----
  	ControlFile.time = (pg_time_t) time(NULL);
  	ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
  
! 	/* minRecoveryPoint, backupStartPoint and backupEndPoint can be left zero */
  
  	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
  	ControlFile.MaxConnections = 100;
***************
*** 637,642 **** RewriteControlFile(void)
--- 637,644 ----
  	ControlFile.minRecoveryPoint.xrecoff = 0;
  	ControlFile.backupStartPoint.xlogid = 0;
  	ControlFile.backupStartPoint.xrecoff = 0;
+ 	ControlFile.backupEndPoint.xlogid = 0;
+ 	ControlFile.backupEndPoint.xrecoff = 0;
  	ControlFile.backupEndRequired = false;
  
  	/*
*** a/src/include/catalog/pg_control.h
--- b/src/include/catalog/pg_control.h
***************
*** 21,27 ****
  
  
  /* Version identifier for this pg_control format */
! #define PG_CONTROL_VERSION	921
  
  /*
   * Body of CheckPoint XLOG records.  This is declared here because we keep
--- 21,27 ----
  
  
  /* Version identifier for this pg_control format */
! #define PG_CONTROL_VERSION	922
  
  /*
   * Body of CheckPoint XLOG records.  This is declared here because we keep
***************
*** 138,143 **** typedef struct ControlFileData
--- 138,150 ----
  	 * record, to make sure the end-of-backup record corresponds the base
  	 * backup we're recovering from.
  	 *
+ 	 * backupEndPoint is the backup end location, if we are recovering from
+ 	 * an online backup which was taken from the server in recovery mode
+ 	 * and haven't reached the end of backup yet. It is initialized to
+ 	 * the minimum recovery point in pg_control which was backed up just
+ 	 * before pg_stop_backup(). It is reset to zero when the end of backup
+ 	 * is reached, and we mustn't start up before that.
+ 	 *
  	 * If backupEndRequired is true, we know for sure that we're restoring
  	 * from a backup, and must see a backup-end record before we can safely
  	 * start up. If it's false, but backupStartPoint is set, a backup_label
***************
*** 146,151 **** typedef struct ControlFileData
--- 153,159 ----
  	 */
  	XLogRecPtr	minRecoveryPoint;
  	XLogRecPtr	backupStartPoint;
+ 	XLogRecPtr	backupEndPoint;
  	bool		backupEndRequired;
  
  	/*

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

Reply via email to