On 2015-04-16 09:43:54 -0700, Peter Geoghegan wrote: > On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund <and...@anarazel.de> wrote: > > I'm, completely independent of logical decoding, of the *VERY* strong > > opinion that 'speculative insertions' should never be visible when > > looking with normal snapshots. For one it allows to simplify > > considerations around wraparound (which has proven to be a very good > > idea, c.f. multixacts + vacuum causing data corruption years after it > > was thought to be harmless). For another it allows to reclaim/redefine > > the bit after a database restart/upgrade. Given how complex this is and > > how scarce flags are that seems like a really good property. > > > > And avoiding those flags to be visible to the outside requires a WAL > > record, otherwise it won't be correct on the standby. > > I'm a bit distracted here, and not sure exactly what you mean. What's > a normal snapshot?
Normal visibility semantics, i.e. SnapshotMVCC, not SnapshotDirty. > Do you just mean that you think that speculative insertions should be > explicitly affirmed by a second record (making it not a speculative > tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has > no chance of seeing a tuple until it was affirmed by a second in-place > modification, regardless of tuple xmin xact commit status? Yes. I think a) HEAP_SPECULATIVE should never be visible outside in a committed transaction. That allows us to redefine what exactly the bit means and such after a simple restart. On IM Heiki said he wants to replace this by a bit in the item pointer, but I don't think that changes things substantially. b) t_ctid should not contain a speculative token in committed (i.e. visible to other sessions using mvcc semantics) tuple. Right now (i.e. master) t_ctid will point to itself for non-updated tuples. I don't think it's good to have something in there that's not an actual ctid in there. The number of places that look into t_ctid for in-progress insertions of other sessions is smaller than the ones that look at tuples in general. c) Cleaning the flag/ctid after a completed speculative insertion makes it far less likely that we end up waiting on a other backend when we wouldn't have to. If a session inserts a large number of tuples speculatively (surely *not* a unlikely thing in the long run) it gets rather likely that tokens are reused. Which means if another backend touches these in-progress records it's quite possible that the currently acquired token is the same as the one on a tuple that has actually finished inserting. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers