On Sun, Jun 19, 2011 at 09:10:02PM -0400, Robert Haas wrote: > Is this an open item for 9.1beta3?
Yes. I've put it on the list. The SxactGlobalXmin and its refcount were getting out of sync with the actual transaction state. This is timing-dependent but I can reproduce it fairly reliably under concurrent workloads. It looks the problem comes from the change a couple days ago that removed the SXACT_FLAG_ROLLED_BACK flag and changed the SxactIsRolledBack checks to SxactIsDoomed. That's the correct thing to do everywhere else, but gets us in trouble here. We shouldn't be ignoring doomed transactions in SetNewSxactGlobalXmin, because they're not eligible for cleanup until the associated backend aborts the transaction and calls ReleasePredicateLocks. However, it isn't as simple as just removing the SxactIsDoomed check from SetNewSxactGlobalXmin. ReleasePredicateLocks calls SetNewSxactGlobalXmin *before* it releases the aborted transaction's SerializableXact (it pretty much has to, because we must drop and reacquire SerializableXactHashLock in between). But we don't want the aborted transaction included in the SxactGlobalXmin computation. It seems like we do need that SXACT_FLAG_ROLLED_BACK after all, even though it's only set for this brief interval. We need to be able to distinguish a transaction that's just been marked for death (doomed) from one that's already called ReleasePredicateLocks. Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers