Hi,
Bill Moran wrote:
First off, "clustering" is a word that is too vague to be useful, so
I'll stop using it. There's multi-master replication, where every
database is read-write, then there's master-slave replication, where
only one server is read-write and the rest are read-only. You can
add failover capabilities to master-slave replication. Then there's
synchronous replication, where all servers are guaranteed to get
updates at the same time. And asynchronous replication, where other
servers may take a while to get updates. These descriptions aren't
really specific to PostgreSQL -- every database replication system
has to make design decisions about which approaches to support.
Good explanation!
Synchronous replication is only
really used when two servers are right next to each other with a
high-speed link (probably gigabit) between them.
Why is that so? There's certainly very valuable data which would gain
from an inter-continental database system. For money transfers, for
example, I'd rather wait half a second for a round trip around the
world, to make sure the RDBS does not 'loose' my money.
PostgreSQL-R is in development, and targeted to allow multi-master,
asynchronous replication without rewriting your application. As
far as I know, it works, but it's still beta.
Sorry, this is nitpicking, but for some reason (see current naming
discussion on -advocacy :-) ), it's "Postgres-R".
Additionally, Postgres-R is considered to be a *synchronous* replication
system, because once you get your commit confirmation, your transaction
is guaranteed to be deliverable and *committable* on all running nodes
(i.e. it's durable and consistent). Or put it another way: asynchronous
systems have to deal with conflicting, but already committed
transactions - Postgres-R does not.
Certainly, this is slightly less restrictive than saying that a
transaction needs to be *committed* on all nodes, before confirming the
commit to the client. But as long as a database session is tied to a
node, this optimization does not alter any transactional semantics. And
despite that limitation, which is mostly the case in reality anyway, I
still consider this to be synchronous replication.
[ To get a strictly synchronous system with Postgres-R, you'd have to
delay read only transactions on a node which hasn't applied all remote
transactions, yet. In most cases, that's unwanted. Instead, a consistent
snapshot is enough, just as if the transaction started *before* the
remote ones which still need to be applied. ]
BTW: does anyone know of a link that describes these high-level concepts?
If not, I think I'll write this up formally and post it.
Hm.. somewhen before 8.3 was released, we had lots of discussions on
-docs about the "high availability and replication" section of the
PostgreSQL documentation. I'd have liked to add these fundamental
concepts, but Bruce - rightly - wanted to keep focused on existing
solutions. And unfortunately, most existing solutions are async,
single-master. So explaining all these wonderful theoretic concepts only
to state that there are no real solutions would have been silly.
Regards
Markus
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly