Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > Here's a bunch of small issues that I spotted: Thanks for going over it again. It is encouraging that you didn't spot any *big* issues. > * The oldserxid code is broken for non-default BLCKSZ. > o The warning will come either too early or too late Good point. That part is easily fixed -- if we want to keep the warning, in light of the next few points. > o There is no safeguard against actually wrapping around the > SLRU, just the warning Any thoughts on what we should do instead? If someone holds open a transaction long enough to burn through a billion transaction IDs (or possibly less if someone uses a smaller BLCKSZ), should we generate a FATAL error? Of course, one other option would be to allow more SLRU segment files, as you raised on another thread. Then this issue goes away because they would hit other, existing, protections against transaction wraparound first. > o I'm suspicious of the OldSerXidPagePrecedesLogically() logic > with 32k BLCKSZ. In that case, OLDSERXID_MAX_PAGE is a larger > than necessary to cover the whole range of 2^32 transactions, > so at high XIDs, say 2^32-1, doesn't it incorrectly think > that low XIDs, say 10, are in the past, not the future? I will look that over to see; but if this is broken, then one of the other SLRU users is probably broken, because I think I stole this code pretty directly from another spot. >> /* >> * Keep a pointer to the currently-running serializable >> * transaction (if any) for quick reference. >> * TODO SSI: Remove volatile qualifier and the then-unnecessary >> * casts? >> */ >> static volatile SERIALIZABLEXACT *MySerializableXact = >> InvalidSerializableXact; > > * So, should we remove it? In most places where contents of > MySerializableXact are modified, we're holding > SerializableXactHashLock in exclusive mode. However, there's > two exceptions: > > o GetSafeSnapshot() modifies MySerializableXact->flags without > any lock. It also inspects > MySerializableXact->possibleUnsafeConflicts without a lock. > What if somone sets some other flag in the flags bitmap just > when GetSafeSnapshot() is about to set > SXACT_FLAG_DEFERRABLE_WAITING? One of the flags might be lost > :-(. > > o CheckForSerializableConflictIn() sets SXACT_FLAG_DID_WRITE > without holding a lock. The same danger is here if someone > else tries to set some other flag concurrently. > > I think we should simply acquire SerializableXactHashLock in > GetSafeSnapshot(). It shouldn't be so much of a hotspot that it > would make any difference in performance. > CheckForSerializableConflictIn() might be called a lot, > however, so maybe we need to check if the flag is set first, > and only acquire the lock and set it if it's not set already. OK. Do checks such as that argue for keeping the volatile flag, or do you think we can drop it if we make those changes? (That would also allow dropping a number of casts which exist just to avoid warnings.) > * Is the SXACT_FLAG_ROLLED_BACK flag necessary? It's only set in > ReleasePredicateLocks() for a fleeting moment while the > function releases all conflicts and locks held by the > transaction, and finally the sxact struct itself containing the > flag. I think that one can go away. It had more of a point many months ago before we properly sorted out what belongs in PreCommit_CheckForSerializationFailure() and what belongs in ReleasePredicateLocks(). The point at which we reached clarity on that and moved things around, this flag probably became obsolete. > Also, isn't a transaction that's already been marked for death > the same as one that has already rolled back, for the purposes > of detecting conflicts? Yes. We should probably ignore any marked-for-death transaction during conflict detection and serialization failure detection. As a start, anywhere there is now a check for rollback and not for this, we should change it to this. There may be some places this can be checked which haven't yet been identified and touched. > * The HavePartialClearThrough/CanPartialClearThrough mechanism > needs a better explanation. The only explanation currently is > this: > >> if (--(PredXact->WritableSxactCount) == 0) >> { >> /* >> * Release predicate locks and rw-conflicts in for >> * all committed transactions. There are no longer any >> * transactions which might conflict with the locks and no >> * chance for new transactions to overlap. Similarly, >> * existing conflicts in can't cause pivots, and any >> * conflicts in which could have completed a dangerous >> * structure would already have caused a rollback, so any >> * remaining ones must be benign. >> */ >> PredXact->CanPartialClearThrough = >> PredXact->LastSxactCommitSeqNo; >> } > > If I understand that correctly, any predicate lock belonging to > any already committed transaction can be safely zapped away at > that instant. Correct. > We don't do it immediately, because it might be expensive. I think it has more to do with getting the LW locks right. We make the call to ClearOldPredicateLocks() farther down in the function, so this is effectively setting things up for that call. > But what is the purpose of HavePartialClearThrough? To avoid doing unnecessary work for completed transactions on which we still need to keep some information but for which we were previously able to clear predicate locks. This relates to the "mitigation" work discussed in these and other posts from around that time: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php I'm happy to work on modifications for any of this or to stay out of your way if you want to. Should I put together a patch for those items where we seem to agree and have a clear way forward? -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers