Here's a shot at a more radical revision, to try to address concerns raised over my failure in the previous (very minimal) suggested patch to address PostgreSQL behavior close to where the spec's behavior is described, and my dragging in of language directly from the spec in a confusing context. I'd appreciate any corrections or suggestions before I massage it into sgml. Also, I don't know if I should leave it with the one example or whether there should be more. I could leave in the old example, although the popular example of reversing updates (one transaction updates all card rows to 'face-up' where they are 'face-down' and vice versa) seems easier to understand. One or both of these? Other suggestions? Also, I tried using SELECT FOR SHARE and SELECT FOR HOLD as the complete solution or instead of one of the table locks, but was able to generate anomalies in all such cases. If someone has a less extreme technique for blocking the anomalies in the receipt example (even when the SELECT of the deposit date for a receipt is in a separate statement earlier in the transaction), please let me know, so that I can include it. If time permits I might take a stab at expanding the section on data consistency checks at the application level; however, that seems less urgent than correcting the obsolescent discussions of the SQL standard and describing some of the anomalies not covered by a discussion of consistency checks. Thanks, -Kevin 13.2. Transaction Isolation The SQL standard defines four levels of transaction isolation in terms of three phenomena that must be prevented between concurrent transactions, with additional constraints on Serializable transactions. These undesirable phenomena are: dirty read A transaction reads data written by a concurrent uncommitted transaction. nonrepeatable read A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read). phantom read A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. The four transaction isolation levels and the corresponding behaviors are described in Table 13-1. Table 13-1. SQL Transaction Isolation Levels <table here> The standard also requires that serializable transactions behave as though they were run one at a time, even though their execution may actually overlap. Since the phenomena described above relate to the visibility of the effects of concurrent transactions, and each serializable transaction must behave as though it were run in its entirety either before or after every other transaction, none of the above phenomena can occur within a serializable transaction. In practice there is another popular transaction isolation level, not mentioned in the standard, generally known as Snapshot isolation level. A transaction executed at this transaction isolation level sees a consistent view of the data; changes made by other transactions are not visible to it. Because of this, none of the phenomena described above are possible. Additionally, when concurrent transactions running at this level attempt to modify the same data, the update conflict causes causes transaction rollback to prevent many forms of update anomalies. Still, data may be viewed or stored in a state which is not consistent with any serial execution of transactions run at this level, so although it is more strict than required for Repeatable Read, it does not meet the standard's definition of the Serializable transaction isolation level. In PostgreSQL you can request any of the four standard transaction isolation levels, but internally there are only two distinct isolation levels, which correspond to the levels Read Committed and Snapshot. When you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read or Serializable you really get Snapshot. Since the standard does not provide for the Snapshot isolation level, PostgreSQL reports it as Serializable. The behavior of the available isolation levels is detailed in the following subsections. To set the transaction isolation level of a transaction, use the command SET TRANSACTION. or specify the desired transaction isolation level on a BEGIN TRANSACTION or START TRANSACTION statement. 13.2.1. Read Committed Isolation Level <unchanged> 13.2.2. Snapshot Isolation Level The Snapshot level (reported as Serializable) provides the strictest transaction isolation available in PostgreSQL. This level approximates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, applications using this level must be prepared to retry transactions due to serialization failures. When a transaction is on the this level, a SELECT query sees only data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed.) This is different from Read Committed in that the SELECT sees a snapshot as of the start of the transaction, not as of the start of the current query within the transaction. Thus, successive SELECT commands within a single transaction always see the same data. UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the transaction will be rolled back with the message ERROR: could not serialize access due to concurrent update because a snapshot transaction cannot modify or lock rows changed by other transactions after the transaction began. When the application receives this error message, it should abort the current transaction and then retry the whole transaction from the beginning. The second time through, the transaction sees the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update. Note that only updating transactions might need to be retried; read-only transactions will never have serialization conflicts. The Snapshot mode provides a rigorous guarantee that each transaction sees and modifies an unchanging view of the database. However, the application has to be prepared to retry transactions when concurrent updates make it impossible to update the original snapshot. Since the cost of redoing complex transactions might be significant, this mode is recommended only when updating transactions contain logic sufficiently complex that they might give wrong answers in Read Committed mode. Most commonly, Snapshot mode is necessary when a transaction executes several successive commands that must see identical views of the database. 13.2.2.1. PostgreSQL Serializable Isolation versus True Serializability With true serializability, any database transaction which can be shown to be correct and safe if run by itself is automatically safe when run in any mix of serializable transactions. PostgreSQL's MVCC framework, snapshot isolation, and limited automatic row-level locking permit a greater degree of concurrency than some other databases; however, even when the transaction isolation level is set to serializable, serialization anomalies can occur in some situations. When it is important to prevent these anomalies, explicit row-level or table-level locking can be used at the expense of reduced concurrency. Since PostgreSQL protects a Serializable transaction against changes in the view of the data, and uses locks to prevent modification of data which is being modified by a concurrent transaction, the anomalies can only occur when a transaction reads data which is modified by a concurrent transaction, and uses that as the basis for database modifications which are read by a concurrent transaction. Data consistency checks at the application level have a problem with this in general, and are addressed in section 13.4. Some examples of other types of anomalies follow, with suggestions on how to use explicit locking to prevent the anomalies where needed. Consider a system which records receipts, each of which must go into a daily deposit. There is a control table with one row containing the current deposit date for receipts. Each transaction which is inserting a receipt selects the deposit date from the control table within its transaction, and uses it for the receipt's deposit date. Somewhere mid-afternoon the control table's date is updated, all subsequent receipts should fall into the new day, and a report is run listing the receipts for the day and giving the deposit total. Serializable transaction isolation mode is requested for every transaction involved. If all transactions involved were truly serializable, any successful SELECT of receipts for a date prior to the deposit date of the control table (as shown by a SELECT in the same transaction) would see the complete, final set of receipts for the completed deposit date. To preserve this view of the data, one or more transactions might need to block until the commit or rollback of other transactions, and one or more transaction might need to be rolled back and run again from the start. Under the PostgreSQL implementation there is no blocking and no chance of rollbacks; however, there can be a window of time during which a SELECT can return an incomplete list of receipts for a date which appears to be closed, even if Serializable mode is requested for all transactions modifying and viewing data. This window of time runs from the commit of the transaction which updated the control table until the commit of any pending transactions which are inserting receipts and which obtained a snapshot before the update of the control table. A database transaction, even if declared Serializable, which selects the sum of the receipts for the closed date and saves it into a daily deposit table, will persist an inaccurate value if run during this same window of time. Alarming as this might sound, if the transactions which insert receipts commit very quickly after the snapshot is obtained, this anomaly might never appear in a real-life system. If the risk is acceptable, or can be controlled by external means, such as delaying the run of the daily receipt report for a few seconds after the update of the control table, no further action is required. It is up to the software developer to recognize where the risk is unacceptable and to decide on a technique to control each known conflict. To prevent the anomaly described above, transactions which insert receipts and transactions which advance the deposit date must both acquire a common lock as their first step. This lock could be acquired on either the control table or the receipt table.
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers