Greg Stark wrote: > Kevin Grittner wrote: >> One place I'm particularly interested in using such a feature is >> in pg_dump. Without it we have the choice of using a SERIALIZABLE >> transaction, which might fail or cause failures (which doesn't >> seem good for a backup program) or using REPEATABLE READ (to get >> current snapshot isolation behavior), which might capture a view >> of the data which contains serialization anomalies. > > I'm puzzled how pg_dump could possibly have serialization > anomalies. Snapshot isolation gives pg_dump a view of the database > containing all modifications committed before it started and no > modifications which committed after it started. Since pg_dump makes > no database modifications itself it can always just be taken to > occur instantaneously before any transaction which committed after > it started. Well, in the SQL-92 standard[1] the definition of serializable transactions was changed to the following: | The execution of concurrent SQL-transactions at isolation level | SERIALIZABLE is guaranteed to be serializable. A serializable | execution is defined to be an execution of the operations of | concurrently executing SQL-transactions that produces the same | effect as some serial execution of those same SQL-transactions. A | serial execution is one in which each SQL-transaction executes to | completion before the next SQL-transaction begins. It hasn't changed since then. Some people in the community cling to the notion, now obsolete for almost two decades, that serializable transactions are defined by the same anomalies which define the other three levels of transaction isolation. (It's probably time to catch up on that one.) Note that the above does *not* say "it's OK for a SELECT in a transaction executing at the serializable isolation level to produce results which are not consistent with any serial execution of serializable transactions, as long as the database *eventually* reaches a state where a repeat of the same SELECT in a new transaction produces results consistent with such execution." Under the standard, even read-only transactions have to follow the rules, and I think most people would want that. Now, commit order in itself doesn't directly affect the apparent order of execution. It's only directs the apparent order of execution to the extent that multiple transactions access the same data. Read-read "conflicts" don't matter in this scheme, and write-write conflicts are already handled under snapshot isolation by preventing both writers from committing -- if one commits, the other is forced to roll back with a serialization failure. That leaves write-read and read-write conflicts. In write-read conflicts, one transaction writes data and then commits in time for another transaction to see it. This implies that the writing transaction came first first in the apparent order of execution. Now for the tricky one: in read-write conflicts (often written as rw-conflict) the reading transaction cannot see the write of a concurrent transaction because of snapshot visibility rules. Since the reading tranaction is unable to see the work of the writing transaction, it must be considered to have come first in the apparent order of execution. In order to have a serialization anomaly under snapshot isolation, you need a situation like this: a transaction which I'll call T0 (matching much discussion on the topic published in recent years) has a rw-dependency on a transaction concurrent to T0, which I'll call T1. In addition, T1 has a rw-dependency on a transaction which is concurrent to it, which I'll call TN. The reason it's not T2 is that it can be the same transaction as T0 or a third transaction. So, because of the rw-conflicts, T0 appears to execute before T1, which appears to execute before TN. (At this point it should be obvious why you've got a problem if T0 and TN are the same transaction.) If T0 and TN are distinct, we still haven't quite met the conditions required to produce a serialization anomaly, however, The next requirement is that TN (the third in apparent order of execution) actually commits first. At this point, the third transaction's writes are exposed for the world to see, while there are still two uncommitted tranactions which appear to have committed first. There are so many ways that this can lead to a cycle in apparent order of execution, some of which can happen in the client application, that Serializable Snapshot Isolaiton (SSI) doesn't pretend to track that. Barring one exception that I worked out myself (although I'd be shocked if someone didn't beat me to it and I just haven't found a paper describing it in my researches), the above describes the conditions under which one of the transactions must be rolled back to prevent a serialization anomaly. The exception is interesting here, though. It is that if T0 is a READ ONLY transaction, it can't participate in an anomaly unless TN commits before T0 acquires its snapshot. This observation is based on the fact that since a READ ONLY transaction can't appear to come after another transaction based on a rw-conflict, I can see only two ways that it can appear to come after TN and thereby complete the cycle in the apparent order of execution: (1) There is a wr-conflict where T0 successfully reads a write from TN, or (2) application software outside the database receives confirmation of the commit of TN before it starts T0. [If anyone can see a way to complete the cycle in apparent order of execution when T0 is READ ONLY without having TN commit before T0 acquires its snapshot, please let me know. I'm basing several optimizations on this, and it is an innovation I have not yet found mentioned in the literature.] OK, to get back to the question -- pg_dump's transaction (T0) could see an inconsistent version of the database if one transaction (TN) writes to a table, another transaction (T1) overlaps TN and can't read something written by TN because they are concurrent, TN commits before T0 acquires its snapshot, T1 writes to a table, T0 starts before T1 commits, and T0 can't read something which T1 wrote (which is sort of a given for a database dump and overlapping transactions). So that's the theory. For a practical example, consider the receipting example which I've posted to the list multiple times before. (TN updates a control record with a deposit date, and T1 is a receipt which uses the deposit date from the control record.) In this not-atypical accounting system, any date before the current deposit date is "closed" -- the receipts for each previous date were set from a control record before it advanced. If pg_dump's transaction plays the role of T0 here, it could capture the updated control record, but not a receipt based on the prior value of that control record. That is, it captures the effects of a *later* transaction, but not the *earlier* one. One last thing -- if anyone who has read the Serializable Wiki page didn't already understand the reason why I had a concern about pg_dump here, and the above helped clarify it, feel free to draw from this and update the Wiki page, or let me know what was helpful, and I will. -Kevin [1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers