On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
> 2011/9/13 Jun Ishiduka <ishizuka....@po.ntts.co.jp>:
>>
>> Update patch.
>>
>> Changes:
>>  * set 'on' full_page_writes by user (in document)
>>  * read "FROM: XX" in backup_label (in xlog.c)
>>  * check status when pg_stop_backup is executed (in xlog.c)
>
> Thanks for updating the patch.
>
> Before reviewing the patch, to encourage people to comment and
> review the patch, I explain what this patch provides:

Attached is the updated version of the patch. I refactored the code, fixed
some bugs, added lots of source code comments, improved the document,
but didn't change the basic design. Please check this patch, and let's use
this patch as the base if you agree with that.

In the current patch, there is no safeguard for preventing users from
taking backup during recovery when FPW is disabled. This is unsafe.
Are you planning to implement such a safeguard?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 935,940 **** SELECT pg_stop_backup();
--- 935,999 ----
     </para>
    </sect2>
  
+   <sect2 id="backup-during-recovery">
+    <title>Making a Base Backup during Recovery</title>
+ 
+    <para>
+     It's possible to make a base backup during recovery. Which allows a user
+     to take a base backup from the standby to offload the expense of
+     periodic backups from the master. Its procedure is similar to that
+     during normal running.
+   <orderedlist>
+    <listitem>
+     <para>
+      Ensure that hot standby is enabled (see <xref linkend="hot-standby">
+      for more information).
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Connect to the database as a superuser and execute <function>pg_start_backup</>.
+      This performs a restartpoint if there is at least one checkpoint record
+      replayed since last restartpoint.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Perform a file system backup.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Copy the pg_control file from the cluster directory to the backup as follows:
+ <programlisting>
+ cp $PGDATA/global/pg_control /mnt/server/backupdir/global
+ </programlisting>
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Again connect to the database as a superuser, and execute
+      <function>pg_stop_backup</>. This terminates the backup mode, but does not
+      perform a switch to the next WAL segment, create a backup history file and
+      wait for all required WAL segments to be archived,
+      unlike that during normal processing.
+     </para>
+    </listitem>
+   </orderedlist>
+    </para>
+ 
+    <para>
+     You cannot use the <application>pg_basebackup</> tool to take the backup
+     during recovery.
+    </para>
+    <para>
+     It's not possible to make a base backup from the server in recovery mode
+     when reading WAL written during a period when <varname>full_page_writes</>
+     was disabled. If you take a base backup from the standby,
+     <varname>full_page_writes</> must be set to true on the master.
+    </para>
+   </sect2>
+ 
    <sect2 id="backup-pitr-recovery">
     <title>Recovering Using a Continuous Archive Backup</title>
  
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 1680,1685 **** SET ENABLE_SEQSCAN TO OFF;
--- 1680,1693 ----
         </para>
  
         <para>
+         WAL written while <varname>full_page_writes</> is disabled does not
+         contain enough information to make a base backup during recovery
+         (see <xref linkend="backup-during-recovery">),
+         so <varname>full_page_writes</> must be enabled on the master
+         to take a backup from the standby.
+        </para>
+ 
+        <para>
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
          The default is <literal>on</>.
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 14014,14020 **** SELECT set_config('log_statement_stats', 'off', false);
     <para>
      The functions shown in <xref
      linkend="functions-admin-backup-table"> assist in making on-line backups.
!     These functions cannot be executed during recovery.
     </para>
  
     <table id="functions-admin-backup-table">
--- 14014,14021 ----
     <para>
      The functions shown in <xref
      linkend="functions-admin-backup-table"> assist in making on-line backups.
!     These functions except <function>pg_start_backup</> and <function>pg_stop_backup</>
!     cannot be executed during recovery.
     </para>
  
     <table id="functions-admin-backup-table">
***************
*** 14094,14100 **** SELECT set_config('log_statement_stats', 'off', false);
      database cluster's data directory, performs a checkpoint,
      and then returns the backup's starting transaction log location as text.
      The user can ignore this result value, but it is
!     provided in case it is useful.
  <programlisting>
  postgres=# select pg_start_backup('label_goes_here');
   pg_start_backup
--- 14095,14103 ----
      database cluster's data directory, performs a checkpoint,
      and then returns the backup's starting transaction log location as text.
      The user can ignore this result value, but it is
!     provided in case it is useful. If <function>pg_start_backup</> is
!     executed during recovery, it performs a restartpoint rather than
!     writing a new checkpoint.
  <programlisting>
  postgres=# select pg_start_backup('label_goes_here');
   pg_start_backup
***************
*** 14122,14127 **** postgres=# select pg_start_backup('label_goes_here');
--- 14125,14137 ----
     </para>
  
     <para>
+     If <function>pg_stop_backup</> is executed during recovery, it just
+     removes the label file, but doesn't create a backup history file and wait for
+     the ending transaction log file to be archived. The return value is equal to
+     or bigger than the exact backup's ending transaction log location.
+    </para>
+ 
+    <para>
      <function>pg_switch_xlog</> moves to the next transaction log file, allowing the
      current file to be archived (assuming you are using continuous archiving).
      The return value is the ending transaction log location + 1 within the just-completed transaction log file.
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 664,670 **** static void xlog_outrec(StringInfo buf, XLogRecord *record);
  #endif
  static void pg_start_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 				  bool *backupEndRequired);
  static void rm_redo_error_callback(void *arg);
  static int	get_sync_bit(int method);
  
--- 664,670 ----
  #endif
  static void pg_start_backup_callback(int code, Datum arg);
  static bool read_backup_label(XLogRecPtr *checkPointLoc,
! 				  bool *backupEndRequired, bool *backupDuringRecovery);
  static void rm_redo_error_callback(void *arg);
  static int	get_sync_bit(int method);
  
***************
*** 6023,6028 **** StartupXLOG(void)
--- 6023,6030 ----
  	uint32		freespace;
  	TransactionId oldestActiveXID;
  	bool		backupEndRequired = false;
+ 	bool		backupDuringRecovery = false;
+ 	DBState	save_state;
  
  	/*
  	 * Read control file and check XLOG status looks valid.
***************
*** 6156,6162 **** StartupXLOG(void)
  	if (StandbyMode)
  		OwnLatch(&XLogCtl->recoveryWakeupLatch);
  
! 	if (read_backup_label(&checkPointLoc, &backupEndRequired))
  	{
  		/*
  		 * When a backup_label file is present, we want to roll forward from
--- 6158,6165 ----
  	if (StandbyMode)
  		OwnLatch(&XLogCtl->recoveryWakeupLatch);
  
! 	if (read_backup_label(&checkPointLoc, &backupEndRequired,
! 						  &backupDuringRecovery))
  	{
  		/*
  		 * When a backup_label file is present, we want to roll forward from
***************
*** 6312,6317 **** StartupXLOG(void)
--- 6315,6321 ----
  		 * pg_control with any minimum recovery stop point obtained from a
  		 * backup history file.
  		 */
+ 		save_state = ControlFile->state;
  		if (InArchiveRecovery)
  			ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
  		else
***************
*** 6332,6343 **** StartupXLOG(void)
  		}
  
  		/*
! 		 * set backupStartPoint if we're starting recovery from a base backup
  		 */
  		if (haveBackupLabel)
  		{
  			ControlFile->backupStartPoint = checkPoint.redo;
  			ControlFile->backupEndRequired = backupEndRequired;
  		}
  		ControlFile->time = (pg_time_t) time(NULL);
  		/* No need to hold ControlFileLock yet, we aren't up far enough */
--- 6336,6369 ----
  		}
  
  		/*
! 		 * Set backupStartPoint if we're starting recovery from a base backup.
! 		 *
! 		 * Set backupEndPoint if we're starting recovery from a base backup
! 		 * which was taken from the server in recovery mode. We confirm
! 		 * that minRecoveryPoint can be used as the backup end location by
! 		 * checking whether the database system status in pg_control indicates
! 		 * DB_IN_ARCHIVE_RECOVERY. If minRecoveryPoint is not available,
! 		 * there is no way to know the backup end location, so we cannot
! 		 * advance recovery any more. In this case, we have to cancel recovery
! 		 * before changing the database system status in pg_control to
! 		 * DB_IN_ARCHIVE_RECOVERY because otherwise subsequent
! 		 * restarted recovery would go through this check wrongly.
  		 */
  		if (haveBackupLabel)
  		{
  			ControlFile->backupStartPoint = checkPoint.redo;
  			ControlFile->backupEndRequired = backupEndRequired;
+ 
+ 			if (backupDuringRecovery)
+ 			{
+ 				if (save_state != DB_IN_ARCHIVE_RECOVERY)
+ 					ereport(FATAL,
+ 							(errmsg("database system status mismatches between "
+ 									"pg_control and backup_label"),
+ 							 errhint("This means that the backup is corrupted and you will "
+ 									 "have to use another backup for recovery.")));
+ 				ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
+ 			}
  		}
  		ControlFile->time = (pg_time_t) time(NULL);
  		/* No need to hold ControlFileLock yet, we aren't up far enough */
***************
*** 6617,6622 **** StartupXLOG(void)
--- 6643,6670 ----
  				/* Pop the error context stack */
  				error_context_stack = errcontext.previous;
  
+ 				if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
+ 					XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
+ 				{
+ 					/*
+ 					 * We have reached the end of base backup, the point where
+ 					 * the minimum recovery point in pg_control which was
+ 					 * backed up just before pg_stop_backup() indicates.
+ 					 * The data on disk is now consistent. Reset backupStartPoint
+ 					 * and backupEndPoint.
+ 					 */
+ 					elog(DEBUG1, "end of backup reached");
+ 
+ 					LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ 
+ 					MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr));
+ 					MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr));
+ 					ControlFile->backupEndRequired = false;
+ 					UpdateControlFile();
+ 
+ 					LWLockRelease(ControlFileLock);
+ 				}
+ 
  				/*
  				 * Update shared recoveryLastRecPtr after this record has been
  				 * replayed.
***************
*** 8417,8423 **** xlog_redo(XLogRecPtr lsn, XLogRecord *record)
  		 * never arrive.
  		 */
  		if (InArchiveRecovery &&
! 			!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
  			ereport(ERROR,
  					(errmsg("online backup was canceled, recovery cannot continue")));
  
--- 8465,8472 ----
  		 * never arrive.
  		 */
  		if (InArchiveRecovery &&
! 			!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
! 			XLogRecPtrIsInvalid(ControlFile->backupEndPoint))
  			ereport(ERROR,
  					(errmsg("online backup was canceled, recovery cannot continue")));
  
***************
*** 8880,8885 **** XLogRecPtr
--- 8929,8935 ----
  do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  {
  	bool		exclusive = (labelfile == NULL);
+ 	bool		recovery_in_progress = false;
  	XLogRecPtr	checkpointloc;
  	XLogRecPtr	startpoint;
  	pg_time_t	stamp_time;
***************
*** 8891,8908 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  	FILE	   *fp;
  	StringInfoData labelfbuf;
  
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
  
! 	if (RecoveryInProgress())
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("recovery is in progress"),
! 				 errhint("WAL control functions cannot be executed during recovery.")));
! 
! 	if (!XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
--- 8941,8960 ----
  	FILE	   *fp;
  	StringInfoData labelfbuf;
  
+ 	recovery_in_progress = RecoveryInProgress();
+ 
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg("must be superuser or replication role to run a backup")));
  
! 	/*
! 	 * During recovery, we don't need to check WAL level. Because the fact that
! 	 * we are now executing pg_start_backup() during recovery means that
! 	 * wal_level is set to hot_standby on the master, i.e., WAL level is sufficient
! 	 * for making an online backup.
! 	 */
! 	if (!recovery_in_progress && !XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
***************
*** 8924,8931 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  	 * we won't have a history file covering the old timeline if pg_xlog
  	 * directory was not included in the base backup and the WAL archive was
  	 * cleared too before starting the backup.
  	 */
! 	RequestXLogSwitch();
  
  	/*
  	 * Mark backup active in shared memory.  We must do full-page WAL writes
--- 8976,8988 ----
  	 * we won't have a history file covering the old timeline if pg_xlog
  	 * directory was not included in the base backup and the WAL archive was
  	 * cleared too before starting the backup.
+ 	 *
+ 	 * During recovery, we skip forcing XLOG file switch, which means that
+ 	 * the backup taken during recovery is not available for the special recovery
+ 	 * case described above.
  	 */
! 	if (!recovery_in_progress)
! 		RequestXLogSwitch();
  
  	/*
  	 * Mark backup active in shared memory.  We must do full-page WAL writes
***************
*** 8941,8946 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 8998,9006 ----
  	 * since we expect that any pages not modified during the backup interval
  	 * must have been correctly captured by the backup.)
  	 *
+ 	 * Note that forcePageWrites has no effect during an online backup from
+ 	 * the server in recovery mode.
+ 	 *
  	 * We must hold WALInsertLock to change the value of forcePageWrites, to
  	 * ensure adequate interlocking against XLogInsert().
  	 */
***************
*** 8970,8980 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  		do
  		{
  			/*
! 			 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
  			 * page problems, this guarantees that two successive backup runs
  			 * will have different checkpoint positions and hence different
  			 * history file names, even if nothing happened in between.
  			 *
  			 * We use CHECKPOINT_IMMEDIATE only if requested by user (via
  			 * passing fast = true).  Otherwise this can take awhile.
  			 */
--- 9030,9048 ----
  		do
  		{
  			/*
! 			 * Force a CHECKPOINT.  Aside from being necessary to prevent torn
  			 * page problems, this guarantees that two successive backup runs
  			 * will have different checkpoint positions and hence different
  			 * history file names, even if nothing happened in between.
  			 *
+ 			 * During recovery, establish a restartpoint if possible. We use the last
+ 			 * restartpoint as the backup starting checkpoint. This means that two
+ 			 * successive backup runs can have same checkpoint positions.
+ 			 *
+ 			 * Since the fact that we are executing pg_start_backup() during
+ 			 * recovery means that bgwriter is running, we can use
+ 			 * RequestCheckpoint() to establish a restartpoint.
+ 			 *
  			 * We use CHECKPOINT_IMMEDIATE only if requested by user (via
  			 * passing fast = true).  Otherwise this can take awhile.
  			 */
***************
*** 9002,9007 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9070,9081 ----
  			 * for each backup instead of forcing another checkpoint, but
  			 * taking a checkpoint right after another is not that expensive
  			 * either because only few buffers have been dirtied yet.
+ 			 *
+ 			 * During recovery, since we don't use the end-of-backup WAL
+ 			 * record and don't write the backup history file, the starting WAL
+ 			 * location doesn't need to be unique. This means that two base
+ 			 * backups started at the same time might use the same checkpoint
+ 			 * as starting locations.
  			 */
  			LWLockAcquire(WALInsertLock, LW_SHARED);
  			if (XLByteLT(XLogCtl->Insert.lastBackupStart, startpoint))
***************
*** 9010,9015 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9084,9091 ----
  				gotUniqueStartpoint = true;
  			}
  			LWLockRelease(WALInsertLock);
+ 			if (recovery_in_progress)
+ 				gotUniqueStartpoint = true;
  		} while (!gotUniqueStartpoint);
  
  		XLByteToSeg(startpoint, _logId, _logSeg);
***************
*** 9031,9036 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
--- 9107,9114 ----
  						 checkpointloc.xlogid, checkpointloc.xrecoff);
  		appendStringInfo(&labelfbuf, "BACKUP METHOD: %s\n",
  						 exclusive ? "pg_start_backup" : "streamed");
+ 		appendStringInfo(&labelfbuf, "SYSTEM STATUS: %s\n",
+ 						 recovery_in_progress ? "recovery" : "in production");
  		appendStringInfo(&labelfbuf, "START TIME: %s\n", strfbuf);
  		appendStringInfo(&labelfbuf, "LABEL: %s\n", backupidstr);
  
***************
*** 9123,9128 **** pg_start_backup_callback(int code, Datum arg)
--- 9201,9208 ----
   * history file at the beginning of archive recovery, but we now use the WAL
   * record for that and the file is for informational and debug purposes only.
   *
+  * During recovery, we only remove the backup label file.
+  *
   * Note: different from CancelBackup which just cancels online backup mode.
   */
  Datum
***************
*** 9149,9154 **** XLogRecPtr
--- 9229,9235 ----
  do_pg_stop_backup(char *labelfile, bool waitforarchive)
  {
  	bool		exclusive = (labelfile == NULL);
+ 	bool		recovery_in_progress = false;
  	XLogRecPtr	startpoint;
  	XLogRecPtr	stoppoint;
  	XLogRecData rdata;
***************
*** 9159,9164 **** do_pg_stop_backup(char *labelfile, bool waitforarchive)
--- 9240,9246 ----
  	char		stopxlogfilename[MAXFNAMELEN];
  	char		lastxlogfilename[MAXFNAMELEN];
  	char		histfilename[MAXFNAMELEN];
+ 	char		systemstatus[20];
  	uint32		_logId;
  	uint32		_logSeg;
  	FILE	   *lfp;
***************
*** 9168,9186 **** do_pg_stop_backup(char *labelfile, bool waitforarchive)
  	int			waits = 0;
  	bool		reported_waiting = false;
  	char	   *remaining;
  
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup"))));
  
! 	if (RecoveryInProgress())
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("recovery is in progress"),
! 				 errhint("WAL control functions cannot be executed during recovery.")));
! 
! 	if (!XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
--- 9250,9271 ----
  	int			waits = 0;
  	bool		reported_waiting = false;
  	char	   *remaining;
+ 	char	   *ptr;
+ 
+ 	recovery_in_progress = RecoveryInProgress();
  
  	if (!superuser() && !is_authenticated_user_replication_role())
  		ereport(ERROR,
  				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg("must be superuser or replication role to run a backup"))));
  
! 	/*
! 	 * During recovery, we don't need to check WAL level. Because the fact that
! 	 * we are now executing pg_stop_backup() means that wal_level is set to
! 	 * hot_standby on the master, i.e., WAL level is sufficient for making an online
! 	 * backup.
! 	 */
! 	if (!recovery_in_progress && !XLogIsNeeded())
  		ereport(ERROR,
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  			  errmsg("WAL level not sufficient for making an online backup"),
***************
*** 9271,9276 **** do_pg_stop_backup(char *labelfile, bool waitforarchive)
--- 9356,9413 ----
  	remaining = strchr(labelfile, '\n') + 1;	/* %n is not portable enough */
  
  	/*
+ 	 * Parse the SYSTEM STATUS line, and check that database system
+ 	 * status matches between pg_start_backup() and pg_stop_backup().
+ 	 */
+ 	ptr = strstr(remaining, "SYSTEM STATUS:");
+ 	if (sscanf(ptr, "SYSTEM STATUS: %19s\n", systemstatus) != 1)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
+ 	if (strcmp(systemstatus, "recovery") == 0 && !recovery_in_progress)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("database system status mismatches between "
+ 						"pg_start_backup() and pg_stop_backup()")));
+ 
+ 	/*
+ 	 * During recovery, we don't write an end-of-backup record. We can
+ 	 * assume that pg_control was backed up just before pg_stop_backup()
+ 	 * and its minimum recovery point can be available as the backup end
+ 	 * location. Without an end-of-backup record, we can check correctly
+ 	 * whether we've reached the end of backup when starting recovery
+ 	 * from this backup.
+ 	 *
+ 	 * We don't force a switch to new WAL file and wait for all the required
+ 	 * files to be archived. This is okay if we use the backup to start
+ 	 * the standby. But, if it's for an archive recovery, to ensure all the
+ 	 * required files are available, a user should wait for them to be archived,
+ 	 * or include them into the backup after pg_stop_backup().
+ 	 *
+ 	 * We return the current minimum recovery point as the backup end
+ 	 * location. Note that it's would be bigger than the exact backup end
+ 	 * location if the minimum recovery point is updated since the backup
+ 	 * of pg_control. The return value of pg_stop_backup() is often used
+ 	 * for a user to calculate the required files. Returning approximate
+ 	 * location is harmless for that use because it's guaranteed not to be
+ 	 * smaller than the exact backup end location.
+ 	 *
+ 	 * XXX currently a backup history file is for informational and debug
+ 	 * purposes only. It's not essential for an online backup. Furthermore,
+ 	 * even if it's created, it will not be archived during recovery because
+ 	 * an archiver is not invoked. So it doesn't seem worthwhile to write
+ 	 * a backup history file during recovery.
+ 	 */
+ 	if (recovery_in_progress)
+ 	{
+ 		LWLockAcquire(ControlFileLock, LW_SHARED);
+ 		stoppoint = ControlFile->minRecoveryPoint;
+ 		LWLockRelease(ControlFileLock);
+ 
+ 		return stoppoint;
+ 	}
+ 
+ 	/*
  	 * Write the backup-end xlog record
  	 */
  	rdata.data = (char *) (&startpoint);
***************
*** 9787,9804 **** pg_xlogfile_name(PG_FUNCTION_ARGS)
   * Returns TRUE if a backup_label was found (and fills the checkpoint
   * location and its REDO location into *checkPointLoc and RedoStartLSN,
   * respectively); returns FALSE if not. If this backup_label came from a
!  * streamed backup, *backupEndRequired is set to TRUE.
   */
  static bool
! read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired)
  {
  	char		startxlogfilename[MAXFNAMELEN];
  	TimeLineID	tli;
  	FILE	   *lfp;
  	char		ch;
  	char		backuptype[20];
  
  	*backupEndRequired = false;
  
  	/*
  	 * See if label file is present
--- 9924,9945 ----
   * Returns TRUE if a backup_label was found (and fills the checkpoint
   * location and its REDO location into *checkPointLoc and RedoStartLSN,
   * respectively); returns FALSE if not. If this backup_label came from a
!  * streamed backup, *backupEndRequired is set to TRUE. If this backup_label
!  * was created during recovery, *backupDuringRecovery is set to TRUE.
   */
  static bool
! read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired,
! 				  bool *backupDuringRecovery)
  {
  	char		startxlogfilename[MAXFNAMELEN];
  	TimeLineID	tli;
  	FILE	   *lfp;
  	char		ch;
  	char		backuptype[20];
+ 	char		systemstatus[20];
  
  	*backupEndRequired = false;
+ 	*backupDuringRecovery = false;
  
  	/*
  	 * See if label file is present
***************
*** 9832,9847 **** read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired)
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  				 errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
  	/*
! 	 * BACKUP METHOD line is new in 9.1. We can't restore from an older backup
! 	 * anyway, but since the information on it is not strictly required, don't
! 	 * error out if it's missing for some reason.
  	 */
! 	if (fscanf(lfp, "BACKUP METHOD: %19s", backuptype) == 1)
  	{
  		if (strcmp(backuptype, "streamed") == 0)
  			*backupEndRequired = true;
  	}
  
  	if (ferror(lfp) || FreeFile(lfp))
  		ereport(FATAL,
  				(errcode_for_file_access(),
--- 9973,9994 ----
  				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  				 errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
  	/*
! 	 * BACKUP METHOD and SYSTEM STATUS lines are new in 9.2. We can't
! 	 * restore from an older backup anyway, but since the information on it
! 	 * is not strictly required, don't error out if it's missing for some reason.
  	 */
! 	if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1)
  	{
  		if (strcmp(backuptype, "streamed") == 0)
  			*backupEndRequired = true;
  	}
  
+ 	if (fscanf(lfp, "SYSTEM STATUS: %19s\n", systemstatus) == 1)
+ 	{
+ 		if (strcmp(systemstatus, "recovery") == 0)
+ 			*backupDuringRecovery = true;
+ 	}
+ 
  	if (ferror(lfp) || FreeFile(lfp))
  		ereport(FATAL,
  				(errcode_for_file_access(),
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 287,292 **** typedef enum
--- 287,294 ----
  static PMState pmState = PM_INIT;
  
  static bool ReachedNormalRunning = false;		/* T if we've reached PM_RUN */
+ static bool OnlineBackupAllowed = false;		/* T if we've reached PM_RUN or
+ 												 * PM_HOT_STANDBY */
  
  bool		ClientAuthInProgress = false;		/* T during new-client
  												 * authentication */
***************
*** 2105,2122 **** pmdie(SIGNAL_ARGS)
  				/* and the walwriter too */
  				if (WalWriterPID != 0)
  					signal_child(WalWriterPID, SIGTERM);
! 
! 				/*
! 				 * If we're in recovery, we can't kill the startup process
! 				 * right away, because at present doing so does not release
! 				 * its locks.  We might want to change this in a future
! 				 * release.  For the time being, the PM_WAIT_READONLY state
! 				 * indicates that we're waiting for the regular (read only)
! 				 * backends to die off; once they do, we'll kill the startup
! 				 * and walreceiver processes.
! 				 */
! 				pmState = (pmState == PM_RUN) ?
! 					PM_WAIT_BACKUP : PM_WAIT_READONLY;
  			}
  
  			/*
--- 2107,2113 ----
  				/* and the walwriter too */
  				if (WalWriterPID != 0)
  					signal_child(WalWriterPID, SIGTERM);
! 				pmState = PM_WAIT_BACKUP;
  			}
  
  			/*
***************
*** 2299,2304 **** reaper(SIGNAL_ARGS)
--- 2290,2296 ----
  			 */
  			FatalError = false;
  			ReachedNormalRunning = true;
+ 			OnlineBackupAllowed = true;
  			pmState = PM_RUN;
  
  			/*
***************
*** 2823,2831 **** PostmasterStateMachine(void)
  	{
  		/*
  		 * PM_WAIT_BACKUP state ends when online backup mode is not active.
  		 */
  		if (!BackupInProgress())
! 			pmState = PM_WAIT_BACKENDS;
  	}
  
  	if (pmState == PM_WAIT_READONLY)
--- 2815,2831 ----
  	{
  		/*
  		 * PM_WAIT_BACKUP state ends when online backup mode is not active.
+ 		 *
+ 		 * If we're in recovery, we can't kill the startup process right away,
+ 		 * because at present doing so does not release its locks.  We might
+ 		 * want to change this in a future release.  For the time being,
+ 		 * the PM_WAIT_READONLY state indicates that we're waiting for
+ 		 * the regular (read only) backends to die off; once they do,
+ 		 * we'll kill the startup and walreceiver processes.
  		 */
  		if (!BackupInProgress())
! 			pmState = ReachedNormalRunning ?
! 				PM_WAIT_BACKENDS : PM_WAIT_READONLY;
  	}
  
  	if (pmState == PM_WAIT_READONLY)
***************
*** 2994,3006 **** PostmasterStateMachine(void)
  			/*
  			 * Terminate backup mode to avoid recovery after a clean fast
  			 * shutdown.  Since a backup can only be taken during normal
! 			 * running (and not, for example, while running under Hot Standby)
! 			 * it only makes sense to do this if we reached normal running. If
! 			 * we're still in recovery, the backup file is one we're
! 			 * recovering *from*, and we must keep it around so that recovery
! 			 * restarts from the right place.
  			 */
! 			if (ReachedNormalRunning)
  				CancelBackup();
  
  			/* Normal exit from the postmaster is here */
--- 2994,3006 ----
  			/*
  			 * Terminate backup mode to avoid recovery after a clean fast
  			 * shutdown.  Since a backup can only be taken during normal
! 			 * running and hot standby, it only makes sense to do this
! 			 * if we reached normal running or hot standby. If we have not
! 			 * reached a consistent recovery state yet, the backup file is
! 			 * one we're recovering *from*, and we must keep it around
! 			 * so that recovery restarts from the right place.
  			 */
! 			if (OnlineBackupAllowed)
  				CancelBackup();
  
  			/* Normal exit from the postmaster is here */
***************
*** 4157,4162 **** sigusr1_handler(SIGNAL_ARGS)
--- 4157,4163 ----
  		ereport(LOG,
  		(errmsg("database system is ready to accept read only connections")));
  
+ 		OnlineBackupAllowed = true;
  		pmState = PM_HOT_STANDBY;
  	}
  
*** a/src/bin/pg_controldata/pg_controldata.c
--- b/src/bin/pg_controldata/pg_controldata.c
***************
*** 232,237 **** main(int argc, char *argv[])
--- 232,240 ----
  	printf(_("Backup start location:                %X/%X\n"),
  		   ControlFile.backupStartPoint.xlogid,
  		   ControlFile.backupStartPoint.xrecoff);
+ 	printf(_("Backup end location:                  %X/%X\n"),
+ 		   ControlFile.backupEndPoint.xlogid,
+ 		   ControlFile.backupEndPoint.xrecoff);
  	printf(_("End-of-backup record required:        %s\n"),
  		   ControlFile.backupEndRequired ? _("yes") : _("no"));
  	printf(_("Current wal_level setting:            %s\n"),
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
***************
*** 883,897 **** do_stop(void)
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present, we're recovering from an online
! 		 * backup instead of performing one.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0 &&
! 			stat(recovery_file, &statbuf) != 0)
  		{
! 			print_msg(_("WARNING: online backup mode is active\n"
! 						"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
--- 883,900 ----
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present and new connection has not been
! 		 * allowed yet, an online backup mode must not be active.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0)
  		{
! 			if (stat(recovery_file, &statbuf) != 0)
! 				print_msg(_("WARNING: online backup mode is active\n"
! 							"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
! 			else
! 				print_msg(_("WARNING: online backup mode is active if you can connect as a superuser to server\n"
! 							"If so, shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
***************
*** 971,985 **** do_restart(void)
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present, we're recovering from an online
! 		 * backup instead of performing one.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0 &&
! 			stat(recovery_file, &statbuf) != 0)
  		{
! 			print_msg(_("WARNING: online backup mode is active\n"
! 						"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
--- 974,991 ----
  		/*
  		 * If backup_label exists, an online backup is running. Warn the user
  		 * that smart shutdown will wait for it to finish. However, if
! 		 * recovery.conf is also present and new connection has not been
! 		 * allowed yet, an online backup mode must not be active.
  		 */
  		if (shutdown_mode == SMART_MODE &&
! 			stat(backup_file, &statbuf) == 0)
  		{
! 			if (stat(recovery_file, &statbuf) != 0)
! 				print_msg(_("WARNING: online backup mode is active\n"
! 							"Shutdown will not complete until pg_stop_backup() is called.\n\n"));
! 			else
! 				print_msg(_("WARNING: online backup mode is active if you can connect as a superuser to server\n"
! 							"If so, shutdown will not complete until pg_stop_backup() is called.\n\n"));
  		}
  
  		print_msg(_("waiting for server to shut down..."));
*** a/src/bin/pg_resetxlog/pg_resetxlog.c
--- b/src/bin/pg_resetxlog/pg_resetxlog.c
***************
*** 503,509 **** GuessControlValues(void)
  	ControlFile.time = (pg_time_t) time(NULL);
  	ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
  
! 	/* minRecoveryPoint and backupStartPoint can be left zero */
  
  	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
  	ControlFile.MaxConnections = 100;
--- 503,509 ----
  	ControlFile.time = (pg_time_t) time(NULL);
  	ControlFile.checkPoint = ControlFile.checkPointCopy.redo;
  
! 	/* minRecoveryPoint, backupStartPoint and backupEndPoint can be left zero */
  
  	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
  	ControlFile.MaxConnections = 100;
***************
*** 637,642 **** RewriteControlFile(void)
--- 637,644 ----
  	ControlFile.minRecoveryPoint.xrecoff = 0;
  	ControlFile.backupStartPoint.xlogid = 0;
  	ControlFile.backupStartPoint.xrecoff = 0;
+ 	ControlFile.backupEndPoint.xlogid = 0;
+ 	ControlFile.backupEndPoint.xrecoff = 0;
  	ControlFile.backupEndRequired = false;
  
  	/*
*** a/src/include/catalog/pg_control.h
--- b/src/include/catalog/pg_control.h
***************
*** 21,27 ****
  
  
  /* Version identifier for this pg_control format */
! #define PG_CONTROL_VERSION	921
  
  /*
   * Body of CheckPoint XLOG records.  This is declared here because we keep
--- 21,27 ----
  
  
  /* Version identifier for this pg_control format */
! #define PG_CONTROL_VERSION	922
  
  /*
   * Body of CheckPoint XLOG records.  This is declared here because we keep
***************
*** 138,143 **** typedef struct ControlFileData
--- 138,150 ----
  	 * record, to make sure the end-of-backup record corresponds the base
  	 * backup we're recovering from.
  	 *
+ 	 * backupEndPoint is the backup end location, if we are recovering from
+ 	 * an online backup which was taken from the server in recovery mode
+ 	 * and haven't reached the end of backup yet. It is initialized to
+ 	 * the minimum recovery point in pg_control which was backed up just
+ 	 * before pg_stop_backup(). It is reset to zero when the end of backup
+ 	 * is reached, and we mustn't start up before that.
+ 	 *
  	 * If backupEndRequired is true, we know for sure that we're restoring
  	 * from a backup, and must see a backup-end record before we can safely
  	 * start up. If it's false, but backupStartPoint is set, a backup_label
***************
*** 146,151 **** typedef struct ControlFileData
--- 153,159 ----
  	 */
  	XLogRecPtr	minRecoveryPoint;
  	XLogRecPtr	backupStartPoint;
+ 	XLogRecPtr	backupEndPoint;
  	bool		backupEndRequired;
  
  	/*
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to