Traditional database journalling logged page before-images (to allow a
database to be rolled back to a point in time), page after-images (to
roll forward from a backup for disaster recovery), or both. Ether
technique eliminates a single point of failure (the disk).
Interbase originally offered both before and after image journalling to
separate journal server that handled both before and after images and
supported multiple databases.
A write ahead log is a totally different animal. A write ahead log
writes page changes to a single serial file so when a transaction
commits, only the serial log gets flushed to storage rather than all
dirty pages in the cache. It doesn't, however, solve the problem of a
single point of failure without RAID.
During the Borland years, Interbase tried to do a write ahead log and
flushed the journal;omg code and server. When it was pointed out that
this introduced a single point of failure, they abandoned the write
ahead log on concentrated on shadowing.
The Falcon storage engine used a write ahead log so a transaction could
be committed with a single non-buffered write.
I also put a write ahead log for replication messages for NuoDB storage
managers to meet some customer's insistence that every piece of data be
stored on at least two devices before a transaction could be reported as
committed. And one of these days I gotta write one for Amorphous for
the same reason.
Write ahead logs are implemented in almost all commercial database systems.
I've forgotten the details of the InnoDB logs, but they implemented MVCC
with a pointer in their lock manager to a prior version of a record in
their log. It does (or used to) have some crock where it stops working
when the lock space is exhausted.
I haven't a clue as to how contemporary Interbase works.
Almost everyone lies about serializability. Everyone should know the
formal definition: A database is serializable if for any set of
concurrent transactions there exists a transaction order such execute
transaction in that order yields the same database state.
Here's a test case: Given a database with variables a and b initialized
to 1 and variable c and d initialized to zero, consider two concurrent
transaction A and B. Transaction A copies b to c and bumps a.
Transaction B copies a to d and bumps b.
A serializable database will either deadlock or have c and d with values
of either 1 and 2 or 2 and 1. An MVCC database will have both c and d
with values of 1.
CockroachDB, which implements MVCC with record timestamps, claims to be
both MVCC and serializable. Some of their literature say they are
"virtually serializable" which when translated from marketing to English
means "not serializable." I haven't been able to find anything that
says they can handle the above test case. It is possible that they
retain the full record read set and re-read and verify every record
before commit, but they don't say they do with and the cost would be
prohibitive. If anyone knows, I'd like to hear about it.
Two-phase locking without phantom control, however, isn't serializable
but the concurrency cost for phantom control is too expensive for most
database systems. Many systems implement a truly serializable mode to
get a marketing check mark that they expect nobody to ever use in
practice (Interbase implemented a two phase locking scheme for tables,
which was both serializable and unusable.
Personally, I believe what while serializable is a sufficient condition
for consistency, it isn't a necessary condition. In my book,
consistency means:
1. A transaction sees a consistent view of the database plus its own
updates.
2. A transaction sees only committed data
3. A transaction can't overwrite any data it couldn't see
4. The database enforces any additional declare consistency constraints.
Work for MVCC.
On 6/3/2022 4:05 AM, Pól Ua L. via Firebird-devel wrote:
Hi again Jim, and thanks for your replies - it's interesting reading about the
history of MVCC inter alia - see below.
Your answers bring up a couple of questions though.
One day I was driving down Route 3 in Manchester, New Hampshire, that
rather than keeping multiple page images, I could keep multiple record
versions, hopefully on the same page, and with clever bookkeeping have
individual transactions keep track of which of server record versions it
should see. So it solved concurrency control, transaction backout,
garbage collection, and database restart without journalling.
Q.1) If MVCC doesn't require jounalling, then why does Interbase now tout the
fact that it has a Write Ahead Log (WAL - which I assume is a synonym for
journalling)?
From the page (https://en.wikipedia.org/wiki/InterBase):
RESILIENT
Live Backups
Distinguished Data Dumps
Write-Ahead Logging <<--------**
Point-in-Time Recovery
Oracle and MySQL (InnoDB engine) use MVCC and have Redo logs - which (at least
AFAICS) are a WAL by another name.
Refs:
https://docs.oracle.com/cd/E18283_01/server.112/e17120/onlineredo001.htm
https://dev.mysql.com/blog-archive/mysql-8-0-new-lock-free-scalable-wal-design/
========================================================
And, in another reply, there's this:
For what it's worth, David Reed's dissertation was on a
non-transactional distributed directory system. Bernstein and Goodman's
book "proved" that MVCC was serializable, which it most definitely was not.
Q.2) How then do the various MVCC systems implement SERIALIZABLE?
It's quite a confusing topic - there's an excellent article (which I haven't
fully digested yet) here:
https://medium.com/paypal-tech/think-twice-before-dropping-acid-and-throw-your-cap-away-dbe0d6171dc0https://medium.com/paypal-tech/think-twice-before-dropping-acid-and-throw-your-cap-away-dbe0d6171dc0
which appears to imply that none of the major systems have a true SERIALIZABLE
transaction isolation level?
Thanks to anyone for any input.
Best and regards,
Pól...
Firebird-Devel mailing list, web interface
athttps://lists.sourceforge.net/lists/listinfo/firebird-devel
--
Jim Starkey, AmorphousDB, LLC
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel