On Mon, Dec 8, 2014 at 9:31 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Dec 8, 2014 at 4:56 AM, Heikki Linnakangas
> <hlinnakan...@vmware.com> wrote:
>> I don't immediately see the problem either, but I have to say that
>> grovelling through all the resource owners seems ugly anyway. Resource
>> owners are not meant to be traversed like that. And there could be a lot of
>> them, and you have to visit every one of them. That could get expensive if
>> there are a lot of resource owners.
>
> 1. I don't really see why resource owners shouldn't be traversed like
> that.  They are clearly intended to form a hierarchy, and there's
> existing code that recurses through the hierarchy from a given level
> downward.  What's ugly about that?

Here's a patch.  I looked at the issue of tracking parent-less
resource owners a bit more closely.  It turns out that resource owners
are always created in TopMemoryContext "since they should only be
freed explicitly" (cf. resowner.c).  I was a bit worried about that,
because it would be bad to keep a list of all parent-less resource
owners if list elements could vanish in a context reset.  But that
doesn't seem to be a concern.  So I stole the nextchild links of
parent-less resource owners, which are not used for anything
currently, to keep a list of such resource owners.

It occurred to me that there's probably not much value in recomputing
xmin when the active snapshot stack is non-empty.  It's not impossible
that a PL/pgsql function could close a cursor with an old xmin and
then do lots of other work (or just sleep for a long time) before
returning to the top-level, but it is pretty unlikely.  So the
attached patch only considers recomputing the advertised xmin when the
active snapshot stack is empty.  That'll happen at the end of each
statement, which seems soon enough.

Upthread, I suggested keeping a tally of the number of snapshots with
the advertised xmin and recomputing the xmin to advertise only when it
reaches 0.  This patch doesn't implementation that optimization, but
it does have code that aborts the traversal of the resource owner
hierarchy as soon as we see an xmin that will preclude advancing our
advertised xmin.  Releasing N resource owners could therefore cost
O(N^2) in the worst case, but note that releasing N resource owners is
*already* an O(N^2) operation in the worst case, because the list of
children of a particular parent resource owner is singly linked, and
thus deleting a resource owner is O(N).  It's been that way for an
awfully long time without anyone complaining, probably because (a)
it's not particularly common to have large numbers of cursors open
simultaneously and (b) even if you do have that, the constant factor
is pretty low.

Following Heikki's previous suggestion, the patch contains checks to
make sure that we find exactly the number of registered snapshots that
we expect to find.  We could consider demoting these to asserts or
something, but this is more likely to catch bugs if there are any.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 0955bcc..186fa91 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -113,6 +113,7 @@ typedef struct ResourceOwnerData
 ResourceOwner CurrentResourceOwner = NULL;
 ResourceOwner CurTransactionResourceOwner = NULL;
 ResourceOwner TopTransactionResourceOwner = NULL;
+ResourceOwner FirstRootResourceOwner = NULL;
 
 /*
  * List of add-on callbacks for resource releasing
@@ -167,6 +168,11 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name)
 		owner->nextchild = parent->firstchild;
 		parent->firstchild = owner;
 	}
+	else
+	{
+		owner->nextchild = FirstRootResourceOwner;
+		FirstRootResourceOwner = owner;
+	}
 
 	return owner;
 }
@@ -442,6 +448,8 @@ ResourceOwnerDelete(ResourceOwner owner)
 	 * the owner tree.  Better a leak than a crash.
 	 */
 	ResourceOwnerNewParent(owner, NULL);
+	Assert(FirstRootResourceOwner == owner);
+	FirstRootResourceOwner = owner->nextchild;
 
 	/* And free the object. */
 	if (owner->buffers)
@@ -502,6 +510,14 @@ ResourceOwnerNewParent(ResourceOwner owner,
 			}
 		}
 	}
+	else
+	{
+		ResourceOwner *link = &FirstRootResourceOwner;
+
+		while (*link != owner)
+			link = &((*link)->nextchild);
+		*link = owner->nextchild;
+	}
 
 	if (newparent)
 	{
@@ -513,7 +529,8 @@ ResourceOwnerNewParent(ResourceOwner owner,
 	else
 	{
 		owner->parent = NULL;
-		owner->nextchild = NULL;
+		owner->nextchild = FirstRootResourceOwner;
+		FirstRootResourceOwner = owner;
 	}
 }
 
@@ -1161,6 +1178,50 @@ ResourceOwnerForgetSnapshot(ResourceOwner owner, Snapshot snapshot)
 }
 
 /*
+ * Invoke a caller-supplied function on every snapshot registered with this
+ * resource owner or any descendent resource owner.  The callback can abort
+ * the traversal by returning true.
+ */
+bool
+ResourceOwnerWalkSnapshots(ResourceOwner owner,
+						   ResourceWalkSnapshotCallback callback,
+						   void *arg)
+{
+	ResourceOwner child;
+	int		i;
+
+	/* Invoke the callback for each of our snapshots */
+	for (i = 0; i < owner->nsnapshots; ++i)
+		if (callback(owner->snapshots[i], arg))
+			return true;
+
+	/* Recurse to handle descendants */
+	for (child = owner->firstchild; child != NULL; child = child->nextchild)
+		if (ResourceOwnerWalkSnapshots(child, callback, arg))
+			return true;
+
+	return false;
+}
+
+/*
+ * Invoke a caller-supplied function on every snapshot registered with any
+ * resource owner in the system.
+ */
+bool
+ResourceOwnerWalkAllSnapshots(ResourceWalkSnapshotCallback callback, void *arg)
+{
+	ResourceOwner	owner;
+
+	for (owner = FirstRootResourceOwner;
+		 owner != NULL;
+		 owner = owner->nextchild)
+		if (ResourceOwnerWalkSnapshots(owner, callback, arg))
+			return true;
+
+	return false;
+}
+
+/*
  * Debugging subroutine
  */
 static void
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index d601efe..50e348c 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -122,13 +122,15 @@ typedef struct ActiveSnapshotElt
 /* Top of the stack of active snapshots */
 static ActiveSnapshotElt *ActiveSnapshot = NULL;
 
-/*
- * How many snapshots is resowner.c tracking for us?
- *
- * Note: for now, a simple counter is enough.  However, if we ever want to be
- * smarter about advancing our MyPgXact->xmin we will need to be more
- * sophisticated about this, perhaps keeping our own list of snapshots.
- */
+/* Data for recomputing xmin */
+typedef struct SnapshotXminData
+{
+	TransactionId	previous_xmin;
+	TransactionId	new_xmin;
+	int				snapshots_remaining;
+} SnapshotXminData;
+
+/* How many snapshots is resowner.c tracking for us? */
 static int	RegisteredSnapshots = 0;
 
 /* first GetTransactionSnapshot call in a transaction? */
@@ -153,6 +155,7 @@ static List *exportedSnapshots = NIL;
 
 static Snapshot CopySnapshot(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
+static bool SnapshotResetXminWorker(Snapshot snapshot, void *data);
 static void SnapshotResetXmin(void);
 
 
@@ -688,12 +691,65 @@ UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
  * If there are no more snapshots, we can reset our PGXACT->xmin to InvalidXid.
  * Note we can do this without locking because we assume that storing an Xid
  * is atomic.
+ *
+ * Even if there are some remaining snapshots, we may be able to advance our
+ * PGXACT->xmin to some degree.  This typically happens when a portal is
+ * dropped.  For efficiency, we only consider recomputing PGXACT->xmin when
+ * the active snapshot stack is empty.
  */
 static void
 SnapshotResetXmin(void)
 {
-	if (RegisteredSnapshots == 0 && ActiveSnapshot == NULL)
+	SnapshotXminData	sxd;
+	ListCell	   *lc;
+
+	if (ActiveSnapshot != NULL)
+		return;
+
+	if (RegisteredSnapshots == 0)
+	{
 		MyPgXact->xmin = InvalidTransactionId;
+		return;
+	}
+
+	sxd.previous_xmin = MyPgXact->xmin;
+	sxd.new_xmin = InvalidTransactionId;
+	sxd.snapshots_remaining = RegisteredSnapshots;
+
+	if (FirstXactSnapshot != NULL &&
+		SnapshotResetXminWorker(FirstXactSnapshot, &sxd))
+		return;
+
+	foreach(lc, exportedSnapshots)
+	{
+		Snapshot	snapshot = lfirst(lc);
+
+		if (SnapshotResetXminWorker(snapshot, &sxd))
+			return;
+	}
+
+	if (ResourceOwnerWalkAllSnapshots(SnapshotResetXminWorker, &sxd))
+		return;
+
+	if (sxd.snapshots_remaining > 0)
+		elog(FATAL, "failed to traverse all snapshots");
+
+	MyPgXact->xmin = sxd.new_xmin;
+}
+
+static bool
+SnapshotResetXminWorker(Snapshot snapshot, void *data)
+{
+	SnapshotXminData   *sxd = data;
+
+	if (!TransactionIdIsValid(sxd->new_xmin)
+		|| TransactionIdPrecedes(snapshot->xmin, sxd->new_xmin))
+		sxd->new_xmin = snapshot->xmin;
+
+	if (--sxd->snapshots_remaining < 0)
+		elog(FATAL, "traversed too many snapshots");
+
+	return !TransactionIdPrecedes(sxd->previous_xmin, sxd->new_xmin);
 }
 
 /*
diff --git a/src/include/utils/resowner_private.h b/src/include/utils/resowner_private.h
index b8fd1a9..aabbb07 100644
--- a/src/include/utils/resowner_private.h
+++ b/src/include/utils/resowner_private.h
@@ -73,6 +73,11 @@ extern void ResourceOwnerRememberSnapshot(ResourceOwner owner,
 							  Snapshot snapshot);
 extern void ResourceOwnerForgetSnapshot(ResourceOwner owner,
 							Snapshot snapshot);
+typedef bool (*ResourceWalkSnapshotCallback) (Snapshot snapshot, void *arg);
+extern bool ResourceOwnerWalkSnapshots(ResourceOwner resowner,
+		ResourceWalkSnapshotCallback callback, void *arg);
+extern bool ResourceOwnerWalkAllSnapshots(ResourceWalkSnapshotCallback callback,
+							  void *arg);
 
 /* support for temporary file management */
 extern void ResourceOwnerEnlargeFiles(ResourceOwner owner);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to