While discussing potential changes to PostgreSQL documentation of
transaction isolation levels, Emmanuel Cecchet pointed out an
intriguing new paper[1] on a new algorithm to provide true
serializable behavior in a MVCC based database, with no additional
blocking; although, there being no such things as a free lunch, there
is an increase in serialization failures under contention.  I have
been hesitant to raise the issue while everyone was busy trying to
wrap up release 8.4; but since that is now in beta testing and PGCon
is fast approaching, I wanted to put the issue out there so that
people have a chance to review it and discuss it.
 
Michael Cahill has given me permission to quote from the paper.  He
has also provided a URL to his personal version of the work[2], which
people may directly access for their personal use, although
redistribution is prohibited by the ACM copyright.  He has asked to be
copied on the discussion here.
 
I know that some here have questioned why anyone would want
serializable transactions.  Our environment has 21 programmers, 21
business process analysts, and four DBAs.  A major part of the time
for this staff is enhancement of existing software and development of
new software.  We have many distinct databases, the largest of which
has a schema of over 300 tables.  (That's well normalized and not
partitioned -- the structure of the data really is that complex.)  We
have over 8,700 queries against these various databases, including
OLTP, reporting, statistics, public access, web queries, etc.  If one
were to go through the well-know techniques to identify all possible
interactions between these queries against these tables, it would not
only be a massive undertaking, the results would immediately be
obsolete.
 
The nice thing about serializable transactions is that if you can show
that a transaction does the right thing when run by itself, you
automatically know that it will function correctly when run in any
mix, or it will fail with a serializable error and can be safely
retried.  Our framework is designed so that serialization errors are
automatically detected and the transaction is retried without any user
interaction or application programming needed -- a serialization
failure appears to the application code and the user the same as
simple blocking.
 
Quoting the paper's abstract:
 
"Many popular database management systems offer snapshot isolation
rather than full serializability. There are well known anomalies
permitted by snapshot isolation that can lead to violations of data
consistency by interleaving transactions that individually maintain
consistency. Until now, the only way to prevent these anomalies was to
modify the applications by introducing artificial locking or update
conflicts, following careful analysis of conflicts between all pairs
of transactions.
 
"This paper describes a modification to the concurrency control
algorithm of a database management system that automatically detects
and prevents snapshot isolation anomalies at runtime for arbitrary
applications, thus providing serializable isolation. The new algorithm
preserves the properties that make snapshot isolation attractive,
including that readers do not block writers and vice versa. An
implementation and performance study of the algorithm are described,
showing that the throughput approaches that of snapshot isolation in
most cases."
 
Quoting a portion of the conclusions:
 
"A prototype of the algorithm has been implemented in Oracle Berkeley
DB and shown to perform significantly better that two-phase locking in
a variety of cases, and often comparably with snapshot isolation.
 
"One property of Berkeley DB that simplified our implementation was
working with page level locking and versioning. In databases that
version and lock at row-level granularity (or finer), additional
effort would be required to avoid phantoms, analogous to standard two
phase locking approaches such as multigranularity locking."
 
Quoting a snippet from the implementation section:
 
"Making these changes to Berkeley DB involved only modest changes to
the source code. In total, only 692 lines of code (LOC) were modified
out of a total of over 200,000 lines of code in Berkeley DB."
 
Michael J. Cahill has since implemented these techniques in InnoDB as
part of his PhD work.  While Microsoft SQL Server does provide full
serializability in an MVCC implementation, I believe they do it with
blocking rather than this newer and faster technique.
 
The paper is a very interesting read, and I fear that if we don't
pursue these techniques, InnoDB users will have legitimate bragging
rights over PostgreSQL users in an important area.
 
Oh, and I know that these issues are well known, and I know that the
solution involves predicate locks; although these won't necessarily be
locks which cause blocking.
 
-Kevin
 
[1] Michael J. Cahill, Uwe Röhm, Alan D. Fekete. Serializable
Isolation for Snapshot Databases. In the Proceedings of the 2008 ACM
SIGMOD international conference on management of data, pages 729-738.
http://doi.acm.org/10.1145/1376616.1376690
 
[2]
http://cahill.net.au/wp-content/uploads/2009/01/real-serializable.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to