Michael Paquier wrote:
> On Fri, Sep 12, 2014 at 3:19 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
> > On Mon, Feb 3, 2014 at 3:42 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> >> On Wed, Jan 15, 2014 at 2:43 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> >>> On 8 January 2014 08:33, Simon Riggs <si...@2ndquadrant.com> wrote:
> >>>
> >>> Patch attached, implemented to reduce writes by SELECTs only.
> >
> > This patch is registered in this CF. It does not apply anymore and
> > needs a rebase. Robert and Amit have provided as well some comments
> > but they have not been addressed. Is it fair to mark it as "returned
> > with feedback" even if it has not been reviewed within the last month?
> For the time being, status has been changed to "waiting on author".

As it happens, I was studying this patch yesterday on the flight back
home.  I gave it a quick look; I noticed it was in the commitfest and
hadn't seen any review activity for many months, which seemed odd.

Anyway I first read the whole thread to know what to focus on, before
going over the patch itself.  Once I finished reading the emails, I had
a vague idea of how I thought it would work: my thinking was that
heap/index scans would either call heap_page_prune_opt, or not,
depending on whether they were part of a read-only executor node.  So if
you have a query that updates a certain table, and while doing so scans
another table in read-only mode, then the HOT updates would be enabled
for the table being written, but disabled for the one being read.

As it turns out, the patch as written is nothing at all like that, and
TBH I don't think I like it very much.

My idea is that we would have a new executor flag, say
EXEC_FLAG_READ_ONLY; we would set it on nodes that are known to be
read-only, and reset it on those that aren't, such as LockRows and
ModifyTable (obviously we need to pass it down correctly from parent to
children).  Then in ExecInitSeqScan and ExecInitIndexScan, if we see the
flag set, we call heap/index_set_allow_prune(false) for the heap scan;
same thing in index scans.  (I envisioned it as a boolean rather than
enabling a certain number of cleanups per scan.)

I tried to code this but I think it doesn't work correctly, and no time
for debug currently.  Anyway let me know what you think of this general
idea.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2b336b0..b859f89 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -342,9 +342,11 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	snapshot = scan->rs_snapshot;
 
 	/*
-	 * Prune and repair fragmentation for the whole page, if possible.
+	 * Prune and repair fragmentation for the whole page, if possible and
+	 * enabled.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_allow_prune)
+		heap_page_prune_opt(scan->rs_rd, buffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1349,6 +1351,9 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 	scan->rs_allow_sync = allow_sync;
 	scan->rs_temp_snap = temp_snap;
 
+	/* HOT pruning is initially allowed; caller can turn it off if desired */
+	scan->rs_allow_prune = true;
+
 	/*
 	 * we can use page-at-a-time mode if it's an MVCC-safe snapshot
 	 */
@@ -1440,6 +1445,12 @@ heap_endscan(HeapScanDesc scan)
 	pfree(scan);
 }
 
+void
+heap_set_allow_prune(HeapScanDesc scan, bool allow)
+{
+	scan->rs_allow_prune = allow;
+}
+
 /* ----------------
  *		heap_getnext	- retrieve next tuple in scan
  *
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 53cf96f..cb2f075 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -253,6 +253,7 @@ index_beginscan(Relation heapRelation,
 	 */
 	scan->heapRelation = heapRelation;
 	scan->xs_snapshot = snapshot;
+	scan->xs_allow_prune = true;
 
 	return scan;
 }
@@ -387,6 +388,12 @@ index_endscan(IndexScanDesc scan)
 	IndexScanEnd(scan);
 }
 
+void
+index_set_allow_prune(IndexScanDesc scan, bool allow_prune)
+{
+	scan->xs_allow_prune = allow_prune;
+}
+
 /* ----------------
  *		index_markpos  - mark a scan position
  * ----------------
@@ -520,9 +527,9 @@ index_fetch_heap(IndexScanDesc scan)
 											 ItemPointerGetBlockNumber(tid));
 
 		/*
-		 * Prune page, but only if we weren't already on this page
+		 * Prune page if enabled, but only if we weren't already on this page
 		 */
-		if (prev_buf != scan->xs_cbuf)
+		if (prev_buf != scan->xs_cbuf && scan->xs_allow_prune)
 			heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf);
 	}
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbd7492..8f4964c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1351,7 +1351,7 @@ BeginCopy(bool is_from,
 		 *
 		 * ExecutorStart computes a result tupdesc for us
 		 */
-		ExecutorStart(cstate->queryDesc, 0);
+		ExecutorStart(cstate->queryDesc, EXEC_FLAG_READ_ONLY);
 
 		tupDesc = cstate->queryDesc->tupDesc;
 	}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5245171..5213f4b 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -232,6 +232,9 @@ GetIntoRelEFlags(IntoClause *intoClause)
 	if (intoClause->skipData)
 		flags |= EXEC_FLAG_WITH_NO_DATA;
 
+	/* XXX this is probably wrong at least for some callers */
+	flags |= EXEC_FLAG_READ_ONLY;
+
 	return flags;
 }
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a718c0b..bbc9e6e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -170,7 +170,10 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
 			 */
 			if (queryDesc->plannedstmt->rowMarks != NIL ||
 				queryDesc->plannedstmt->hasModifyingCTE)
+			{
 				estate->es_output_cid = GetCurrentCommandId(true);
+				eflags &= ~EXEC_FLAG_READ_ONLY;
+			}
 
 			/*
 			 * A SELECT without modifying CTEs can't possibly queue triggers,
@@ -185,6 +188,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
 		case CMD_INSERT:
 		case CMD_DELETE:
 		case CMD_UPDATE:
+			eflags &= ~EXEC_FLAG_READ_ONLY;
 			estate->es_output_cid = GetCurrentCommandId(true);
 			break;
 
@@ -871,6 +875,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		if (bms_is_member(i, plannedstmt->rewindPlanIDs))
 			sp_eflags |= EXEC_FLAG_REWIND;
 
+		/* XXX can we set EXEC_FLAG_READ_ONLY here? */
+
 		subplanstate = ExecInitNode(subplan, estate, sp_eflags);
 
 		estate->es_subplanstates = lappend(estate->es_subplanstates,
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 9b1e975..a43bdb8 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -334,9 +334,11 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
 	ntup = 0;
 
 	/*
-	 * Prune and repair fragmentation for the whole page, if possible.
+	 * Prune and repair fragmentation for the whole page, if possible and
+	 * allowed.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	if (scan->rs_allow_prune)
+		heap_page_prune_opt(scan->rs_rd, buffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 2b89dc6..5ff08e5 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -608,6 +608,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 											   estate->es_snapshot,
 											   indexstate->iss_NumScanKeys,
 											 indexstate->iss_NumOrderByKeys);
+	if (eflags | EXEC_FLAG_READ_ONLY)
+		index_set_allow_prune(indexstate->iss_ScanDesc, false);
 
 	/*
 	 * If no run-time keys to calculate, go ahead and pass the scankeys to the
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 990240b..fb270cb 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -293,6 +293,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
 	/* check for unsupported flags */
 	Assert(!(eflags & EXEC_FLAG_MARK));
 
+	eflags &= ~EXEC_FLAG_READ_ONLY;
+
 	/*
 	 * create state structure
 	 */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8ac6047..94c33a6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1078,9 +1078,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ListCell   *l;
 	int			i;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
 
+	/* turn off read-only mode, if set */
+	eflags &= ~EXEC_FLAG_READ_ONLY;
+
 	/*
 	 * create state structure
 	 */
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index ab13e47..f06b049 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -140,6 +140,9 @@ InitScanRelation(SeqScanState *node, EState *estate, int eflags)
 									 estate->es_snapshot,
 									 0,
 									 NULL);
+	/* set pruning option */
+	if (eflags | EXEC_FLAG_READ_ONLY)
+		heap_set_allow_prune(currentScanDesc, true);
 
 	node->ss_currentRelation = currentRelation;
 	node->ss_currentScanDesc = currentScanDesc;
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index f9ed266..83e7f97 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -177,7 +177,7 @@ ProcessQuery(PlannedStmt *plan,
 	/*
 	 * Call ExecutorStart to prepare the plan for execution
 	 */
-	ExecutorStart(queryDesc, 0);
+	ExecutorStart(queryDesc, EXEC_FLAG_READ_ONLY);
 
 	/*
 	 * Run the plan to completion.
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index d99158f..0b607ee 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -142,6 +142,7 @@ extern void index_rescan(IndexScanDesc scan,
 			 ScanKey keys, int nkeys,
 			 ScanKey orderbys, int norderbys);
 extern void index_endscan(IndexScanDesc scan);
+extern void index_set_allow_prune(IndexScanDesc scan, bool allow_prune);
 extern void index_markpos(IndexScanDesc scan);
 extern void index_restrpos(IndexScanDesc scan);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7f7166d..daa687c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -113,6 +113,7 @@ extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
 					 bool allow_strat, bool allow_sync);
 extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
 				  int nkeys, ScanKey key);
+extern void heap_set_allow_prune(HeapScanDesc scan, bool allow_prune);
 extern void heap_rescan(HeapScanDesc scan, ScanKey key);
 extern void heap_endscan(HeapScanDesc scan);
 extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 8a57698..48b7bfe 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -33,6 +33,7 @@ typedef struct HeapScanDescData
 	bool		rs_allow_strat; /* allow or disallow use of access strategy */
 	bool		rs_allow_sync;	/* allow or disallow use of syncscan */
 	bool		rs_temp_snap;	/* unregister snapshot at scan end? */
+	bool		rs_allow_prune;	/* allow HOT-pruning during this scan */
 
 	/* state set up at initscan time */
 	BlockNumber rs_nblocks;		/* number of blocks to scan */
@@ -71,6 +72,7 @@ typedef struct IndexScanDescData
 	ScanKey		keyData;		/* array of index qualifier descriptors */
 	ScanKey		orderByData;	/* array of ordering op descriptors */
 	bool		xs_want_itup;	/* caller requests index tuples */
+	bool		xs_allow_prune;	/* allow HOT-pruning during this scan */
 
 	/* signaling to index AM about killing index tuples */
 	bool		kill_prior_tuple;		/* last-returned tuple is dead */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d167b49..42955eb 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -54,6 +54,9 @@
  * WITH/WITHOUT_OIDS tell the executor to emit tuples with or without space
  * for OIDs, respectively.  These are currently used only for CREATE TABLE AS.
  * If neither is set, the plan may or may not produce tuples including OIDs.
+ *
+ * READ_ONLY tells the executor that the scan requires read-only access.
+ * (we use this to disable HOT pruning on certain nodes)
  */
 #define EXEC_FLAG_EXPLAIN_ONLY	0x0001	/* EXPLAIN, no ANALYZE */
 #define EXEC_FLAG_REWIND		0x0002	/* need efficient rescan */
@@ -63,6 +66,7 @@
 #define EXEC_FLAG_WITH_OIDS		0x0020	/* force OIDs in returned tuples */
 #define EXEC_FLAG_WITHOUT_OIDS	0x0040	/* force no OIDs in returned tuples */
 #define EXEC_FLAG_WITH_NO_DATA	0x0080	/* rel scannability doesn't matter */
+#define EXEC_FLAG_READ_ONLY		0x0100	/* read-only node */
 
 
 /*
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to