On Mon, Dec 20, 2010 at 5:32 PM, Florian Pflug <f...@phlo.org> wrote: > On Dec20, 2010, at 18:54 , Robert Haas wrote: >> On Mon, Dec 20, 2010 at 12:49 PM, Florian Pflug <f...@phlo.org> wrote: >>> For me, this is another very good reason to explore this further. Plus, it >>> improves the ratio of grotty-ness vs. number-of-problems-soved ;-) >> >> By all means, look into it further. I fear the boat is filling up >> with water, but if you manage to come up with a workable solution I'll >> be as happy as anyone, promise! > > I'll try to create a details proposal. To do that, however, I'll require > some guidance on whats acceptable and whats not. > > Here's a summary of the preceding discussion > > To deal with aborted transactions correctly, we need to track the last > locker of a particular tuple that actually committed. If we also want > to fix the bug that causes a row lock to be lost upon doing > lock;savepoint;update;restore that "latest committed locker" will > sometimes need to be a set, since it'll need to store the outer > transaction's xid as well as the latest actually committed locker. > > As long as no transaction aborts are involved, the tuple's xmax > contains all the information we need. If a transaction updates, > deletes or locks a row, the previous xmax is overwritten. If the > transaction later aborts, we cannot decide whether it has previously > been locked or not. > > And these ideas have come up > > A) Transactions who merely lock a row could put the previous > locker's xid (if >= GlobalXmin) *and* their own xid into a multi-xid, > and store that in xmax. For shared locks, this merely means cleaning > out the existing multi-xid a bit less aggressively. There's > no risk of bloat there, since we only need to keep one committed > xid, not all of them. For exclusive locks, we currently never > create a multi-xid. That'd change, we'd need to create one > if we find a previous locker with an xid >= GlobalXmin. This doesn't > solve the UPDATE and DELETE cases. For SELECT-FOR-SHARE this > is probably the best option, since it comes very close to what > we do currently. > > B) A transaction who UPDATEs or DELETEs a tuple could create an > intermediate lock-only tuple which'd contain the necessary > information about previous lock holders. We'd only need to do > that if there actually is one with xid >= GlobalXmin. We could > then choose whether to do the same for SELECT-FOR-UPDATE, or > whether we'd prefer to go with (A) > > C) The ctid field is only necessary for updated tuples. We could thus > overlay it with a field which stores the last committed locker after > a DELETE. UPDATEs could be handled either as in (B), or by storing the > information in the ctid-overlay in the *new* tuple. SELECT-FOR-UPDATE > could again either also use the ctid overlay or use (A). > > D) We could add a new tuple header field xlatest. To support binary > upgrade, we'd need to be able to read tuples without that field > also. We could then either create a new tuple version upon the > first lock request to such a tuple (which would then include the > new header), or we could simply raise a serialization error if > a serializable transaction tried to update a tuple without the > field whose xmax was aborted and >= GlobalXmin. > > I have the nagging feeling that (D) will meet quite some resistance. (C) was > too well received either, though I wonder if that'd change if the grotty-ness > was hidden behind a API, much xvac/cmin/cmax overlay is. (B) seems like a > lot of overhead, but maybe cleaner. More research is needed though to check > how it'd interact with HOT and how to get the locking right. (A) is IMHO the > best solution for the SELECT-FOR-SHARE since it's very close to what we do > today. > > Any comments? Especially of the "don't you dare" kind?
I think any solution based on (D) has zero chance of being accepted, and it wouldn't be that high except that probabilities can't be negative. Unfortunately, I don't understand (A) or (B) well enough to comment intelligently. My previously expressed concern about (C) wasn't based on ugliness, but rather on my believe that there is likely a whole lot of code which relies on the CTID being a self-link when no UPDATE has occurred. We'd have to be confident that all such cases had been found and fixed, which might not be easy to be confident about. I have a more "meta" concern, too. Let's suppose (just for example) that you write some absolutely brilliant code which makes it possible to overlay CTID and which has the further advantage of being stunningly obvious, so that we have absolute confidence that it is correct. The obvious question is - if we can overlay CTID in some situations, is there a better use for that than this? Just to throw out something that might be totally impractical, maybe we could get rid of XMAX. If the tuple hasn't been updated, the CTID field stores the XMAX; if it HAS been updated, the CTID points to the successor tuple, whose XMIN is our XMAX. I'm sure there are a bunch of reasons why that doesn't actually work - I can think of some of them myself - but the point is that there's an opportunity cost to stealing those bits. Once they're dedicated to this purpose, they can't ever be dedicated to any other purpose. Space in the tuple header is precious, and I am not at all convinced that we should be willing to surrender any for this. We have to believe not only that this change is good, but also that it's more good than some other purpose to which that bit space could potentially be put. Now obviously, overlaying onto space that's already reserved is better than allocating new space. But it's not free. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers