While discussing potential changes to PostgreSQL documentation of transaction isolation levels, Emmanuel Cecchet pointed out an intriguing new paper[1] on a new algorithm to provide true serializable behavior in a MVCC based database, with no additional blocking; although, there being no such things as a free lunch, there is an increase in serialization failures under contention. I have been hesitant to raise the issue while everyone was busy trying to wrap up release 8.4; but since that is now in beta testing and PGCon is fast approaching, I wanted to put the issue out there so that people have a chance to review it and discuss it. Michael Cahill has given me permission to quote from the paper. He has also provided a URL to his personal version of the work[2], which people may directly access for their personal use, although redistribution is prohibited by the ACM copyright. He has asked to be copied on the discussion here. I know that some here have questioned why anyone would want serializable transactions. Our environment has 21 programmers, 21 business process analysts, and four DBAs. A major part of the time for this staff is enhancement of existing software and development of new software. We have many distinct databases, the largest of which has a schema of over 300 tables. (That's well normalized and not partitioned -- the structure of the data really is that complex.) We have over 8,700 queries against these various databases, including OLTP, reporting, statistics, public access, web queries, etc. If one were to go through the well-know techniques to identify all possible interactions between these queries against these tables, it would not only be a massive undertaking, the results would immediately be obsolete. The nice thing about serializable transactions is that if you can show that a transaction does the right thing when run by itself, you automatically know that it will function correctly when run in any mix, or it will fail with a serializable error and can be safely retried. Our framework is designed so that serialization errors are automatically detected and the transaction is retried without any user interaction or application programming needed -- a serialization failure appears to the application code and the user the same as simple blocking. Quoting the paper's abstract: "Many popular database management systems offer snapshot isolation rather than full serializability. There are well known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that individually maintain consistency. Until now, the only way to prevent these anomalies was to modify the applications by introducing artificial locking or update conflicts, following careful analysis of conflicts between all pairs of transactions. "This paper describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation and performance study of the algorithm are described, showing that the throughput approaches that of snapshot isolation in most cases." Quoting a portion of the conclusions: "A prototype of the algorithm has been implemented in Oracle Berkeley DB and shown to perform significantly better that two-phase locking in a variety of cases, and often comparably with snapshot isolation. "One property of Berkeley DB that simplified our implementation was working with page level locking and versioning. In databases that version and lock at row-level granularity (or finer), additional effort would be required to avoid phantoms, analogous to standard two phase locking approaches such as multigranularity locking." Quoting a snippet from the implementation section: "Making these changes to Berkeley DB involved only modest changes to the source code. In total, only 692 lines of code (LOC) were modified out of a total of over 200,000 lines of code in Berkeley DB." Michael J. Cahill has since implemented these techniques in InnoDB as part of his PhD work. While Microsoft SQL Server does provide full serializability in an MVCC implementation, I believe they do it with blocking rather than this newer and faster technique. The paper is a very interesting read, and I fear that if we don't pursue these techniques, InnoDB users will have legitimate bragging rights over PostgreSQL users in an important area. Oh, and I know that these issues are well known, and I know that the solution involves predicate locks; although these won't necessarily be locks which cause blocking. -Kevin [1] Michael J. Cahill, Uwe Röhm, Alan D. Fekete. Serializable Isolation for Snapshot Databases. In the Proceedings of the 2008 ACM SIGMOD international conference on management of data, pages 729-738. http://doi.acm.org/10.1145/1376616.1376690 [2] http://cahill.net.au/wp-content/uploads/2009/01/real-serializable.pdf
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers