On 2013-10-22 10:52:48 -0400, Robert Haas wrote: > On Fri, Oct 18, 2013 at 2:26 PM, Andres Freund <and...@2ndquadrant.com> wrote: > > So. As it turns out that solution isn't sufficient in the face of VACUUM > > FULL and mixed DML/DDL transaction that have not yet been decoded. > > > > To reiterate, as published it works like: > > For every modification of catalog tuple (insert, multi_insert, update, > > delete) that has influence over visibility issue a record that contains: > > * filenode > > * ctid > > * (cmin, cmax) > > > > When doing a visibility check on a catalog row during decoding of mixed > > DML/DDL transaction lookup (cmin, cmax) for that row since we don't > > store both for the tuple. > > > > That mostly works great. > > > > The problematic scenario is decoding a transaction that has done mixed > > DML/DDL *after* a VACUUM FULL/CLUSTER has been performed. The VACUUM > > FULL obviously changes the filenode and the ctid of a tuple, so we > > cannot successfully do a lookup based on what we logged before. > > So I have a new idea for handling this problem, which seems obvious in > retrospect. What if we make the VACUUM FULL or CLUSTER log the old > CTID -> new CTID mappings? This would only need to be done for > catalog tables, and maybe could be skipped for tuples whose XIDs are > old enough that we know those transactions must already be decoded.
Ah. If it only were so simple ;). That was my first idea, and after I'd bragged in an 2ndq internal chat that I'd found a simple idea I obviously had to realize it doesn't work. Consider: INIT_LOGICAL_REPLICATION; CREATE TABLE foo(...); BEGIN; INSERT INTO foo; ALTER TABLE foo ...; INSERT INTO foo; COMMIT TX 3; VACUUM FULL pg_class; START_LOGICAL_REPLICATION; When we decode tx 3 we haven't yet read the mapping from the vacuum freeze. That scenario can happen either because decoding was stopped for a moment, or because decoding couldn't keep up (slow connection, whatever). There also can be nasty variations where the VACUUM FULL happens while a transaction is writing data since we don't hold locks on system relations for very long. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers