On 2012-12-11 21:05:51 -0500, Joachim Wieland wrote: > On Tue, Dec 11, 2012 at 6:52 PM, Andres Freund <and...@2ndquadrant.com> wrote: > > One problem I see is that while exporting a snapshot solves the > > visibility issues of the table's contents it does not protect against > > schema changes. I am not sure whether thats a problem. > > > > If somebody runs a CLUSTER or something like that, the table's contents > > will be preserved including MVCC semantics. That's fine. > > The more problematic cases I see are TRUNCATE, DROP and ALTER > > TABLE. > > This is why the pg_dump master process executes a > > lock table <table> in access share mode > > for every table, so your commands would all block. > > In fact it's even more complicated because the pg_dump worker > processes also need to lock the table. They try to get a similar lock > in "NOWAIT" mode right before dumping the table. If they don't get the > lock that means that somebody else is waiting for an exclusive lock > (this is the case you describe) and the backup will fail.
[Tom explains why this is problematic way better than I do, see his email] A trivial example - you can play nastier games than that though: S1: CREATE TABLE test(id int); S1: INSERT INTO test VALUES(1), (2); S1: BEGIN; S1: ALTER TABLE test RENAME COLUMN id TO id2; S2: pg_dump S2: waits S1: COMMIT; pg_dump: [archiver (db)] query failed: ERROR: column "id" of relation "test" does not exist pg_dump: [archiver (db)] query was: COPY public.test (id) TO stdout; > > Problem 2: > > > > To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction > > needs to be alive. That's currently solved by exporting the snapshot in > > the walsender connection that did the INIT_LOGICAL_REPLICATION. The > > question is how long should we preserve that snapshot? > > You lost me on this one after the first sentence... But in general the > snapshot isn't so much a magical thing: As long the exporting > transaction is open, it guarantees that there is a transaction out > there that is holding off vacuum from removing data and it's also > guaranteeing that the snapshot as is has existed at some time in the > past. > > Once it is applied to another transaction you now have two > transactions that will hold off vacuum because they share the same > xmin,xmax values. You could also just end the exporting transaction at > that point. I should probably have given a bit more context here... All this is in the context of decoding WAL back into something useful for the purpose of replication. This is done in the already existing walsender process, using the commands I explained in the original post. What we do there is walk the WAL from some point until we could assemble a snapshot for exactly that LSN. Several preconditions need to be met there (like a low enough xmin horizon, so the old catalog entries are still arround). Using that snapshot we can now decode the entries stored in the WAL. If you setup a new replica though, getting all changes from one point where you're consistent isn't all that interesting if you cannot also get a backup from exactly that point in time because otherwise its very hard to start with something sensible on the standby without locking everything down. So the idea is that once we found that consistent snapshot we also export it. Only that we are in a walsender connection which doesn't expose transactional semantics, so the snapshot cannot be "deallocated" by COMMIT; ing. So the question is basically about how to design the user interface of that deallocation. Makes slightly more sense now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers