Eliot, On 11/23/2010 09:43 PM, Eliot Gable wrote: > I know there has been a lot of talk about replication getting built into > Postgres and I know of many projects that aim to fill the role. However, > I have not seen much in the way of a serious attempt at multi-master > write scaling.
Postgres-XC and Postgres-R are two pretty serious projects, IMO. > I understand the fundamental problem with write scaling > across multiple nodes is Disk I/O and inter-node communication latency > and that in the conventional synchronous, multi-master replication type > setup you would be limited to the speed of the slowest node, That's not necessarily true for Postgres-R, which is why I call it an 'eager' solution (as opposed to fully synchronous). While it guarantees that all transactions that got committed *will* be committable on all nodes at some time in the future, nodes may still lag behind others. Thus, even a slower / busy node doesn't hold back the others, but may serve stale data. Ideally, your load balancer accounts for that and gives that node a break or at least reduces the amount of transactions going to that node, so it can catch up again. Anyway, that's pretty Postgres-R specific. > plus the > communication protocol overhead and latency. However, it occurs to me > that if you had a shared disk system via either iSCSI, Fiber Channel, > NFS, or whatever (which also had higher I/O capabilities than a single > server could utilize), if you used a file system that supported locks on > a particular section (extent) of a file, it should theoretically be > possible for multiple Postgres instances on multiple systems sharing the > database to read and write to the database without causing corruption. Possible, yes. Worthwile to do, probably not. > Has anyone put any thought into what it would take to do this in > Postgres? Is it simply a matter of making the database file interaction > code aware of extent locking, or is it considerably more involved than > that? It also occurs to me that you probably need some form of > transaction ordering mechanism across the nodes based on synchronized > timestamps, but it seems Postgres-R has the required code to do that > portion already written. If you rely on such an ordering, why use additional locks. That seems like a waste of resources compared to Postgres-R. Not to mention the introduction of a SPOF with the SAN. > Wouldn't this type of setup be far > simpler to implement That's certainly debatable, yes. I obviously think that the benefit per cost ratio for Postgres-R is better :-) > and provide better scalability than trying to do > multi-master replication using log shipping or binary object shipping or > any other techniques? It's more similar to replication using two phase commit, which provably doesn't scale (see for example [1]) And using a SAN for locking certainly doesn't beat 2PC via an equally modern/expensive interconnect. > Wouldn't it also be far more efficient since you > don't need to have a copy of your data on each master node and therefor > also don't have to ship your data to each node and have each node > process it? You have to ship it from the SAN to the node, so I definitely don't think so, but see this as an argument against it. Each having a local copy and only exchange locking information and transactional changes sounds like much less traffic overall. Regards Markus Wanner [1]: The Dangers of Replication and a Solution, Gray et al, In Proc. of the SIGMOD Conf., 1996, http://research.microsoft.com/apps/pubs/default.aspx?id=68247 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers