[HACKERS] Hot Standby on git

Simon Riggs Sat, 26 Sep 2009 09:57:04 -0700

Just a note to say that Hot Standby patch is now on git repository
 git://git.postgresql.org/git/users/simon/postgres
Branch name: hot_standby


The complete contents of that repository are BSD licenced contributions
to the PostgreSQL project.

Any further changes to that will be by agreement here on hackers. From
now, I will be submitting each individual change as patch-on-patch to
allow people to see and discuss them and to confirm them as open source
contributions. I request anybody else interested to do the same to allow
us to work together. All contributions welcome.

My record of agreed changes is here
http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items

You'll notice that I've already completed 8 changes (10 commits); those
are all fairly minor changes, so submitted here as a combined patch.
There are 9 pending changes, so far, none of which appear to be major
obstacles to resolve. Many thanks to Heikki for a thorough review which
has identified nearly all of those change requests. 

I estimate that making the remaining changes noted on the Wiki and fully
testing them will take at least 2 weeks. Gabriele Bartolini is assisting
in this area, though neither of us are able to work full time on this.
We still have ample time to complete the project in this release.

Many thanks to Magnus and Aidan for helping me resolve my git-wrestling
contest and apologies for the delay while that bout happened.

-- 
 Simon Riggs           www.2ndQuadrant.com

*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1934,1941 **** if (!triggered)
     </para>
  
     <para>
! 	Read-only here means "no writes to the permanent database tables". So
! 	there are no problems with queries that make use of temporary sort and
  	work files will be used.  Temporary tables cannot be created and
  	therefore cannot be used at all in recovery mode.
     </para>
--- 1934,1941 ----
     </para>
  
     <para>
! 	Read-only here means "no writes to the permanent database tables". 
! 	There are no problems with queries that make use of temporary sort and
  	work files will be used.  Temporary tables cannot be created and
  	therefore cannot be used at all in recovery mode.
     </para>
***************
*** 1983,1989 **** if (!triggered)
       </listitem>
  	 <listitem>
  	  <para>
!        LOCK, with restrictions, see later
        </para>
       </listitem>
  	 <listitem>
--- 1983,1989 ----
       </listitem>
  	 <listitem>
  	  <para>
!        LOCK TABLE, though only when explicitly IN ACCESS SHARE MODE
        </para>
       </listitem>
  	 <listitem>
***************
*** 2000,2014 **** if (!triggered)
     </para>
  
     <para>
! 	These actions will produce error messages
  
  	<itemizedlist>
  	 <listitem>
  	  <para>
!        DML - Insert, Update, Delete, COPY FROM, Truncate which all write data. 
! 	   Any RULE which generates DML will throw error messages as a result.
! 	   Note that there is no action possible that can result in a trigger
! 	   being executed.
        </para>
       </listitem>
  	 <listitem>
--- 2000,2013 ----
     </para>
  
     <para>
! 	These actions produce error messages
  
  	<itemizedlist>
  	 <listitem>
  	  <para>
!        DML - Insert, Update, Delete, COPY FROM, Truncate. 
! 	   Note that there are no actions that result in a trigger
! 	   being executed during recovery.
        </para>
       </listitem>
  	 <listitem>
***************
*** 2024,2029 **** if (!triggered)
--- 2023,2041 ----
       </listitem>
  	 <listitem>
  	  <para>
+        RULEs on SELECT statements that generate DML commands. RULEs on DML
+ 	   commands that produce only SELECT statements are already disallowed
+ 	   during read-only transactions.
+       </para>
+      </listitem>
+ 	 <listitem>
+ 	  <para>
+        LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+        LOCK TABLE that explicitly requests a lock other than ACCESS SHARE MODE.
+       </para>
+      </listitem>
+ 	 <listitem>
+ 	  <para>
         Transaction management commands that explicitly set non-read only state
  		<itemizedlist>
  		 <listitem>
***************
*** 2069,2077 **** if (!triggered)
      
     <para>
  	Note that current behaviour of read only transactions when not in
! 	recovery is to allow the last two actions, so there is a small and
! 	subtle difference in behaviour between standby read-only transactions
! 	and read only transactions during normal running.
  	It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
  	temporary tables may be lifted in a future release, if their internal
  	implementation is altered to make this possible.
--- 2081,2089 ----
      
     <para>
  	Note that current behaviour of read only transactions when not in
! 	recovery is to allow the last two actions, so there are small and
! 	subtle differences in behaviour between read-only transactions
! 	run on standby and during normal running.
  	It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
  	temporary tables may be lifted in a future release, if their internal
  	implementation is altered to make this possible.
***************
*** 2082,2088 **** if (!triggered)
  	processing mode. Sessions will remain connected while the server
  	changes mode. Current transactions will continue, though will remain
  	read-only. After this, it will be possible to initiate read-write
! 	transactions, though users must *manually* reset their 
  	default_transaction_read_only setting first, if they want that
  	behaviour.
     </para>
--- 2094,2100 ----
  	processing mode. Sessions will remain connected while the server
  	changes mode. Current transactions will continue, though will remain
  	read-only. After this, it will be possible to initiate read-write
! 	transactions, though users must explicitly reset their 
  	default_transaction_read_only setting first, if they want that
  	behaviour.
     </para>
***************
*** 2098,2107 **** if (!triggered)
     </para>
  
     <para>
! 	In recovery, transactions will not be permitted to take any lock higher
! 	other than AccessShareLock or AccessExclusiveLock. In addition,
! 	transactions may never assign a TransactionId and may never write WAL.
! 	The LOCK TABLE command by default applies an AccessExclusiveLock. 
  	Any LOCK TABLE command that runs on the standby and requests a specific
  	lock type other than AccessShareLock will be rejected.
     </para>
--- 2110,2118 ----
     </para>
  
     <para>
! 	In recovery, transactions will not be permitted to take any table lock
! 	higher than AccessShareLock. In addition, transactions may never assign
! 	a TransactionId and may never write WAL. 
  	Any LOCK TABLE command that runs on the standby and requests a specific
  	lock type other than AccessShareLock will be rejected.
     </para>
***************
*** 2168,2175 **** if (!triggered)
  
     <para>
  	An example of the above would be an Administrator on Primary server
! 	runs a DROP TABLE command that refers to a table currently in use by
! 	a User query on the standby server.
     </para>
  
     <para>
--- 2179,2186 ----
  
     <para>
  	An example of the above would be an Administrator on Primary server
! 	runs a DROP TABLE command on a table that's currently being queried
! 	in the standby server.
     </para>
  
     <para>
***************
*** 2198,2206 **** if (!triggered)
     <para>
  	We have a number of choices for resolving query conflicts.  The default
  	is that we wait and hope the query completes. If the recovery is not paused,
! 	then the server will wait automatically until the server the lag between
  	primary and standby is at most max_standby_delay seconds. Once that grace
! 	period expires, we then take one of the following actions:
  
  	  <itemizedlist>
  	   <listitem>
--- 2209,2217 ----
     <para>
  	We have a number of choices for resolving query conflicts.  The default
  	is that we wait and hope the query completes. If the recovery is not paused,
! 	then the server will wait automatically until the lag between
  	primary and standby is at most max_standby_delay seconds. Once that grace
! 	period expires, we take one of the following actions:
  
  	  <itemizedlist>
  	   <listitem>
***************
*** 2213,2219 **** if (!triggered)
  	    <para>
  		 If the conflict is caused by cleanup records we tell the standby query
  	 	 that a conflict has occurred and that it must cancel itself to avoid the
! 	 	 risk that it attempts to silently fails to read relevant data because
  	 	 that data has been removed. (This is very similar to the much feared
  		 error message "snapshot too old").
  	    </para>
--- 2224,2230 ----
  	    <para>
  		 If the conflict is caused by cleanup records we tell the standby query
  	 	 that a conflict has occurred and that it must cancel itself to avoid the
! 	 	 risk that it silently fails to read relevant data because
  	 	 that data has been removed. (This is very similar to the much feared
  		 error message "snapshot too old").
  	    </para>
***************
*** 2222,2228 **** if (!triggered)
  		 Note also that this means that idle-in-transaction sessions are never
  		 canceled except by locks. Users should be clear that tables that are
  		 regularly and heavily updated on primary server will quickly cause
! 		 cancellation of any longer running queries made against those tables.
  	    </para>
  
  	    <para>
--- 2233,2239 ----
  		 Note also that this means that idle-in-transaction sessions are never
  		 canceled except by locks. Users should be clear that tables that are
  		 regularly and heavily updated on primary server will quickly cause
! 		 cancellation of any longer running queries in the standby.
  	    </para>
  
  	    <para>
***************
*** 2235,2241 **** if (!triggered)
     </para>
  
     <para>
! 	Other remdial actions exist if the number of cancelations is unacceptable.
  	The first option is to connect to primary server and keep a query active
  	for as long as we need to run queries on the standby. This guarantees that
  	a WAL cleanup record is never generated and we don't ever get query
--- 2246,2252 ----
     </para>
  
     <para>
! 	Other remedial actions exist if the number of cancelations is unacceptable.
  	The first option is to connect to primary server and keep a query active
  	for as long as we need to run queries on the standby. This guarantees that
  	a WAL cleanup record is never generated and we don't ever get query
***************
*** 2283,2289 **** if (!triggered)
     <title>Administrator's Overview</title>
  
     <para>
! 	If there is a recovery.conf file present then the will start in Hot Standby
  	mode by default, though this can be disabled by setting
  	"recovery_connections = off" in recovery.conf. The server may take some
  	time to enable recovery connections since the server must first complete
--- 2294,2300 ----
     <title>Administrator's Overview</title>
  
     <para>
! 	If there is a recovery.conf file present the server will start in Hot Standby
  	mode by default, though this can be disabled by setting
  	"recovery_connections = off" in recovery.conf. The server may take some
  	time to enable recovery connections since the server must first complete
***************
*** 2308,2314 **** LOG:  database system is ready to accept read only connections
  	The setting of max_connections on the standby should be equal to or
  	greater than the setting of max_connections on the primary. This is to
  	ensure that standby has sufficient resources to manage incoming
! 	transactions.
     </para>
  
     <para>
--- 2319,2325 ----
  	The setting of max_connections on the standby should be equal to or
  	greater than the setting of max_connections on the primary. This is to
  	ensure that standby has sufficient resources to manage incoming
! 	transactions. max_prepared_transactions already has this restriction.
     </para>
  
     <para>
***************
*** 2329,2335 **** LOG:  database system is ready to accept read only connections
  	A set of functions allow superusers to control the flow of recovery
  	are described in <xref linkend="functions-recovery-control-table">.
  	These functions allow you to pause and continue recovery, as well
! 	as dynamically set new recovery targets wile recovery progresses.
  	Note that when a server is paused the apparent delay between primary
  	and standby will continue to increase.
     </para>
--- 2340,2346 ----
  	A set of functions allow superusers to control the flow of recovery
  	are described in <xref linkend="functions-recovery-control-table">.
  	These functions allow you to pause and continue recovery, as well
! 	as dynamically set new recovery targets while recovery progresses.
  	Note that when a server is paused the apparent delay between primary
  	and standby will continue to increase.
     </para>
***************
*** 2342,2348 **** LOG:  database system is ready to accept read only connections
  	themselves.  Users will be able to write large sort temp files and
  	re-generate relcache info files, so there is no part of the database
  	that is truly read-only during hot standby mode. There is no restriction
! 	on use of set returning functions, or other users of tuplestore/tuplesort
  	code. Note also that writes to remote databases will still be possible,
  	even though the transaction is read-only locally.
     </para>
--- 2353,2359 ----
  	themselves.  Users will be able to write large sort temp files and
  	re-generate relcache info files, so there is no part of the database
  	that is truly read-only during hot standby mode. There is no restriction
! 	on the use of set returning functions, or other users of tuplestore/tuplesort
  	code. Note also that writes to remote databases will still be possible,
  	even though the transaction is read-only locally.
     </para>
***************
*** 2354,2360 **** LOG:  database system is ready to accept read only connections
     </para>
  
     <para>
! 	The following types of administrator command will not be accepted
  	during recovery mode
  
  	  <itemizedlist>
--- 2365,2371 ----
     </para>
  
     <para>
! 	The following types of administrator command are not be accepted
  	during recovery mode
  
  	  <itemizedlist>
***************
*** 2558,2563 **** LOG:  database system is ready to accept read only connections
--- 2569,2583 ----
  	 available for use when running queries during recovery.
      </para>
     </listitem>
+    <listitem>
+     <para>
+      Full knowledge of running transactions is required before snapshots
+ 	 may be taken. Transactions that take use large numbers of subtransactions
+ 	 (currently greater than 64) will delay the start of read only
+ 	 connections until the completion of the longest running write transaction.
+ 	 If this situation occurs explanatory messages will be sent to server log.
+     </para>
+    </listitem>
    </itemizedlist>
  
     </para>
*** a/src/backend/access/gin/ginxlog.c
--- b/src/backend/access/gin/ginxlog.c
***************
*** 622,628 **** gin_redo(XLogRecPtr lsn, XLogRecord *record)
  	uint8		info = record->xl_info & ~XLR_INFO_MASK;
  
  	/*
! 	 * GIN indexes do not require any conflict processing. XXX really?
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
--- 622,630 ----
  	uint8		info = record->xl_info & ~XLR_INFO_MASK;
  
  	/*
! 	 * GIN indexes do not require any conflict processing. The GIN
! 	 * posting tree is scanned in logical order during VACUUM and
! 	 * no additional processing is required.
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
*** a/src/backend/access/gist/gistxlog.c
--- b/src/backend/access/gist/gistxlog.c
***************
*** 397,403 **** gist_redo(XLogRecPtr lsn, XLogRecord *record)
  	MemoryContext oldCxt;
  
  	/*
! 	 * GIST indexes do not require any conflict processing. XXX really?
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
--- 397,406 ----
  	MemoryContext oldCxt;
  
  	/*
! 	 * GIST indexes do not require any conflict processing. This is
! 	 * because GIST does not remove killed tuples when it performs
! 	 * page splits in the same way b-trees do. Also VACUUMs of 
! 	 * GIST indexes occur in logical not physical order.
  	 */
  	if (InHotStandby)
  		RecordKnownAssignedTransactionIds(record->xl_xid);
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 947,952 **** begin:;
--- 947,971 ----
  	FIN_CRC32(rdata_crc);
  	record->xl_crc = rdata_crc;
  
+ #ifdef WAL_DEBUG
+ 	if (XLOG_DEBUG)
+ 	{
+ 		StringInfoData buf;
+ 
+ 		initStringInfo(&buf);
+ 		appendStringInfo(&buf, "INSERT @ %X/%X: ",
+ 						 RecPtr.xlogid, RecPtr.xrecoff);
+ 		xlog_outrec(&buf, record);
+ 		if (rdata->data != NULL)
+ 		{
+ 			appendStringInfo(&buf, " - ");
+ 			RmgrTable[record->xl_rmid].rm_desc(&buf, record->xl_info, rdata->data);
+ 		}
+ 		elog(LOG, "%s", buf.data);
+ 		pfree(buf.data);
+ 	}
+ #endif
+ 
  	/* Record begin of record in appropriate places */
  	ProcLastRecPtr = RecPtr;
  	Insert->PrevRecord = RecPtr;
*** a/src/backend/commands/lockcmds.c
--- b/src/backend/commands/lockcmds.c
***************
*** 49,61 **** LockTableCommand(LockStmt *lockstmt)
  
  		/*
  		 * During recovery we only accept these variations:
! 		 *
! 		 * LOCK TABLE foo       -- implicitly, AccessExclusiveLock
! 		 * LOCK TABLE foo IN ACCESS SHARE MODE
! 		 * LOCK TABLE foo IN ACCESS EXCLUSIVE MODE
  		 */
! 		if (lockstmt->mode != AccessShareLock
! 			&& lockstmt->mode != AccessExclusiveLock)
  			PreventCommandDuringRecovery();
  
  		LockTableRecurse(reloid, relation,
--- 49,57 ----
  
  		/*
  		 * During recovery we only accept these variations:
! 		 * LOCK TABLE foo IN ACCESS SHARE MODE which is effectively a no-op
  		 */
! 		if (lockstmt->mode != AccessShareLock)
  			PreventCommandDuringRecovery();
  
  		LockTableRecurse(reloid, relation,
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 502,509 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec)
  	if (!xlrec->subxid_overflow)
  		recoverySnapshotValid = true;
  	else
! 		elog(trace_recovery(DEBUG2), 
! 				"running xact data has incomplete subtransaction data");
  
  	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
  	nxids = 0;
--- 502,509 ----
  	if (!xlrec->subxid_overflow)
  		recoverySnapshotValid = true;
  	else
! 		ereport(LOG, 
! 				(errmsg("consistent state delayed because recovery snapshot incomplete")));
  
  	xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt));
  	nxids = 0;
***************
*** 1502,1508 **** HaveTransactionsInCommit(TransactionId *xids, int nxids)
  
  /*
   * BackendPidGetProc -- get a backend's PGPROC given its PID
!  *
   * Returns NULL if not found.  Note that it is up to the caller to be
   * sure that the question remains meaningful for long enough for the
   * answer to be used ...
--- 1502,1508 ----
  
  /*
   * BackendPidGetProc -- get a backend's PGPROC given its PID
!  *	
   * Returns NULL if not found.  Note that it is up to the caller to be
   * sure that the question remains meaningful for long enough for the
   * answer to be used ...
***************
*** 1536,1576 **** BackendPidGetProc(int pid)
  }
  
  /*
-  * BackendXidGetProc -- get a backend's PGPROC given its XID
-  *
-  * Returns NULL if not found.  Note that it is up to the caller to be
-  * sure that the question remains meaningful for long enough for the
-  * answer to be used ...
-  */
- PGPROC *
- BackendXidGetProc(TransactionId xid)
- {
- 	PGPROC	   *result = NULL;
- 	ProcArrayStruct *arrayP = procArray;
- 	int			index;
- 
- 	if (xid == InvalidTransactionId)	/* never match invalid xid */
- 		return 0;
- 
- 	LWLockAcquire(ProcArrayLock, LW_SHARED);
- 
- 	for (index = 0; index < arrayP->numProcs; index++)
- 	{
- 		PGPROC	   *proc = arrayP->procs[index];
- 
- 		if (proc->xid == xid)
- 		{
- 			result = proc;
- 			break;
- 		}
- 	}
- 
- 	LWLockRelease(ProcArrayLock);
- 
- 	return result;
- }
- 
- /*
   * BackendXidGetPid -- get a backend's pid given its XID
   *
   * Returns 0 if not found or it's a prepared transaction.  Note that
--- 1536,1541 ----
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***************
*** 2695,2705 **** ProcessInterrupts(void)
  					 * idle-in-transaction session, so make it FATAL instead.
  					 */
  					case CONFLICT_MODE_ERROR:
! 						cancelMode = CONFLICT_MODE_FATAL;
  							break;
  
  					case CONFLICT_MODE_ERROR_IF_NOT_IDLE:
! 						cancelMode = CONFLICT_MODE_NOT_SET;
  							break;
  
  					default:
--- 2695,2713 ----
  					 * idle-in-transaction session, so make it FATAL instead.
  					 */
  					case CONFLICT_MODE_ERROR:
! 							cancelMode = CONFLICT_MODE_FATAL;
  							break;
  
  					case CONFLICT_MODE_ERROR_IF_NOT_IDLE:
! 							/*
! 							 * If we still have a snapshot then we must
! 							 * cancel, else we are free to go.
! 							 * XXXHS: As above, cancel means FATAL, for now.
! 							 */
! 							if (MyProc->xmin == 0)
! 								cancelMode = CONFLICT_MODE_NOT_SET;
! 							else
! 								cancelMode = CONFLICT_MODE_FATAL;
  							break;
  
  					default:
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 1259,1265 **** XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
  	/*
  	 * Data lives in different places depending upon when snapshot taken
  	 */
! 	if (snapshot->takenDuringRecovery)
  	{
  		/*
  		 * If the snapshot contains full subxact data, the fastest way to check
--- 1259,1265 ----
  	/*
  	 * Data lives in different places depending upon when snapshot taken
  	 */
! 	if (!snapshot->takenDuringRecovery)
  	{
  		/*
  		 * If the snapshot contains full subxact data, the fastest way to check
*** a/src/include/access/nbtree.h
--- b/src/include/access/nbtree.h
***************
*** 536,545 **** typedef BTScanOpaqueData *BTScanOpaque;
  #define SK_BT_DESC			(INDOPTION_DESC << SK_BT_INDOPTION_SHIFT)
  #define SK_BT_NULLS_FIRST	(INDOPTION_NULLS_FIRST << SK_BT_INDOPTION_SHIFT)
  
- /* XXX probably needs new RMgr call to do this cleanly */
- extern bool btree_is_cleanup_record(uint8 info);
- extern bool btree_needs_cleanup_lock(uint8 info);
- 
  /*
   * prototypes for functions in nbtree.c (external entry points for btree)
   */
--- 536,541 ----
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
***************
*** 54,60 **** extern int	GetTransactionsInCommit(TransactionId **xids_p);
  extern bool HaveTransactionsInCommit(TransactionId *xids, int nxids);
  
  extern PGPROC *BackendPidGetProc(int pid);
- extern PGPROC *BackendXidGetProc(TransactionId xid);
  extern int	BackendXidGetPid(TransactionId xid);
  extern bool IsBackendPid(int pid);
  
--- 54,59 ----

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hot Standby on git

Reply via email to