Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/01/07 08:12, chris smith wrote: >> Ever read anything on how myspace is laid out? The big ones need >> replication to handle the traffic. > > Actually no. > > http://highscalability.com/livejournal-architecture > > "Using MySQL replication only takes you so far." (Yeh it's mysql but > the point is valid regardless). > "You can't keep adding read slaves and scale." > > A lot use sharding now to keep scaling (limiting to "X" users/accounts > per database system and just keep adding more database servers for the > next "X" accounts). Hmmm. Horizontally partitioning your "database" into multiple physical databases is 10+ years old. At least. This is how DEC implemented the billing database for DirecTV, and how we implemented *large* toll systems in the US Northeast. In addition to the account databases, you need a "reference" database for tables that can't be partitioned by account, be able to run queries across databases, and middleware that knows how to direct transactions to the correct database. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG2c6QS9HxQb37XmcRAkuGAJ4thc+owEX8OJl1qaTrY+krHkMIwgCfWver aUEiSPkrw4Gnf7dI0ftSVJ0= =4wFD -END PGP SIGNATURE- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
"chris smith" <[EMAIL PROTECTED]> wrote: > > > Ever read anything on how myspace is laid out? The big ones need > > replication to handle the traffic. > > Actually no. > > http://highscalability.com/livejournal-architecture > > "Using MySQL replication only takes you so far." (Yeh it's mysql but > the point is valid regardless). > "You can't keep adding read slaves and scale." > > A lot use sharding now to keep scaling (limiting to "X" users/accounts > per database system and just keep adding more database servers for the > next "X" accounts). I got the impression that they hadn't moved _all_ of their DB needs to sharding. Just the ones that exceeded the scalability of replication, but they don't explicitly say, IIRC. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
> Ever read anything on how myspace is laid out? The big ones need > replication to handle the traffic. Actually no. http://highscalability.com/livejournal-architecture "Using MySQL replication only takes you so far." (Yeh it's mysql but the point is valid regardless). "You can't keep adding read slaves and scale." A lot use sharding now to keep scaling (limiting to "X" users/accounts per database system and just keep adding more database servers for the next "X" accounts). Myspace info here: http://highscalability.com/myspace-architecture At 3mill users: - split its user base into chunks of 1 million accounts and put all the data keyed to those accounts in a separate instance of SQL Server I'm sure there's replication behind the scenes to help with read-queries but it's definitely not a magic wand that will fix everything. -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Markus Schiltknecht <[EMAIL PROTECTED]> wrote: > > Hi, > > Bill Moran wrote: > > While true, I feel those applications are the exception, not the rule. > > Most DBs these days are the blogs and the image galleries, etc. And > > those don't need or want the overhead associated with synchronous > > replication. > > Uhm.. do blogs and image galleries need replication at all? Ever read anything on how myspace is laid out? The big ones need replication to handle the traffic. > I'm thinking more of the business critical applications, where high > availability is a real demand - and where your data *should* better be > distributed among multiple data centers just to avoid a single point of > failure. Agreed. Judged by importance, they're bigger. Judged by raw numbers, I suspect that they're less prominent than the blogs and image galleries. I guess it depends on your viewpoint. > for most other stuff MySQL is good enough I hate when people say that. Any company that says that is of limited viability in my opinion. For one thing, saying it's "good enough" is setting yourself up for trouble when your company expands its requirements. > > It's synchronous for the reason you describe, > > but it's asynchronous because a query that has returned successfully > > is not _guaranteed_ to be committed everywhere yet. Seems like we're > > dealing with a limitation in the terminology :) > > Certainly! But sync and async replication are so well known and used > terms... on the other hand, I certainly agree that in Postgres-R, the > nodes do not process transactions synchronously, but asynchronous. Good point. > Maybe it's really better to speak of eager and lazy replication, as in > some literature (namely the initial Postgres-R paper of Bettina Kemme). > > > This could potentially be a problem on (for example) a web application, > > where a particular user's experience may be load-balanced to another > > node at any time. Of course, you just have to write the application > > with that knowledge. > > IMO, such heavily dynamic load-balancing is rarely useful. > > With application support, it's easily doable: let the first transaction > on node A query the (global) transaction identifier and after connecting > to the next node B, ask that to wait until that transaction has committed. > > It gets a little harder without application support: the load balancer > would have to keep track of sessions and their last (writing) transaction. > > Again, thank you for pointing this out. Keep up the good work. I wish I had some spare cycles to work on Postgres-R -- it seems like a very interesting problem. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
On Monday 2007-08-27 08:04, Andrew Sullivan wrote: > On Sat, Aug 25, 2007 at 11:13:45AM -0400, Tom Lane wrote: > > In case you hadn't noticed the disconnect between these statements: > > if they have to be that close together, there *will* be a single point > > of failure. Fire in your data center, for instance, will take out every > > copy of your data. So as a "high availability" solution I don't find > > it all that compelling. > > Indeed. There's another issue, too, which you have to spend some > time reading the manual to get. The clustering stuff is _yet > another_ table type, with subtly different semantics from other table > types. As usual, this means that you can blow off your foot by > mixing table types in a transaction. As near as I can tell, the > cluster table type (I disremeber the name of it) cannot be run in > strict mode, either. > > To answer the OP's question, you can do some "cluster-like" things by > doing hardware clustering -- two machines attached to a RAID with > some sort of hardware fail-over in place. > > I think that the MySQL cluster stuff wasn't intended as an HA > feature, though (although they might well be selling it that way). > It was a way to scale many small systems for certain kinds of > workloads. My impression is that in most cases, it's a SQL-ish > solution to a problem where someone decided to use the SQL nail > because that's the hammer they had. I can think of ways you could > use it, and I'm not surprised that Some Giant Corp is doing so. But > I'd be astonished if someone used it for truly valuable data. I > would think very hard about the qualifications of someone who > proposed using it for financial data. > If it was developed by Ericson for Telco purposes then it would be designed to be wicked fast for OLTP (billing-switching is an OLTP application) with VERY high up time and reliable (customers and on occasion regulators get angry when the phones dont work). It wouldn't matter if it can be geographically distributed. If the switching center catches fire you're hosed anyway. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Hi, Bill Moran wrote: While true, I feel those applications are the exception, not the rule. Most DBs these days are the blogs and the image galleries, etc. And those don't need or want the overhead associated with synchronous replication. Uhm.. do blogs and image galleries need replication at all? I'm thinking more of the business critical applications, where high availability is a real demand - and where your data *should* better be distributed among multiple data centers just to avoid a single point of failure. for most other stuff MySQL is good enough I find that line fuzzy. Yeah, it is. It's synchronous for the reason you describe, but it's asynchronous because a query that has returned successfully is not _guaranteed_ to be committed everywhere yet. Seems like we're dealing with a limitation in the terminology :) Certainly! But sync and async replication are so well known and used terms... on the other hand, I certainly agree that in Postgres-R, the nodes do not process transactions synchronously, but asynchronous. Maybe it's really better to speak of eager and lazy replication, as in some literature (namely the initial Postgres-R paper of Bettina Kemme). This could potentially be a problem on (for example) a web application, where a particular user's experience may be load-balanced to another node at any time. Of course, you just have to write the application with that knowledge. IMO, such heavily dynamic load-balancing is rarely useful. With application support, it's easily doable: let the first transaction on node A query the (global) transaction identifier and after connecting to the next node B, ask that to wait until that transaction has committed. It gets a little harder without application support: the load balancer would have to keep track of sessions and their last (writing) transaction. Again, thank you for pointing this out. Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
In response to Markus Schiltknecht <[EMAIL PROTECTED]>: > Hi, > > Bill Moran wrote: > > First off, "clustering" is a word that is too vague to be useful, so > > I'll stop using it. There's multi-master replication, where every > > database is read-write, then there's master-slave replication, where > > only one server is read-write and the rest are read-only. You can > > add failover capabilities to master-slave replication. Then there's > > synchronous replication, where all servers are guaranteed to get > > updates at the same time. And asynchronous replication, where other > > servers may take a while to get updates. These descriptions aren't > > really specific to PostgreSQL -- every database replication system > > has to make design decisions about which approaches to support. > > Good explanation! > > > Synchronous replication is only > > really used when two servers are right next to each other with a > > high-speed link (probably gigabit) between them. > > Why is that so? There's certainly very valuable data which would gain > from an inter-continental database system. For money transfers, for > example, I'd rather wait half a second for a round trip around the > world, to make sure the RDBS does not 'loose' my money. While true, I feel those applications are the exception, not the rule. Most DBs these days are the blogs and the image galleries, etc. And those don't need or want the overhead associated with synchronous replication. > > PostgreSQL-R is in development, and targeted to allow multi-master, > > asynchronous replication without rewriting your application. As > > far as I know, it works, but it's still beta. > > Sorry, this is nitpicking, but for some reason (see current naming > discussion on -advocacy :-) ), it's "Postgres-R". Sorry. > Additionally, Postgres-R is considered to be a *synchronous* replication > system, because once you get your commit confirmation, your transaction > is guaranteed to be deliverable and *committable* on all running nodes > (i.e. it's durable and consistent). Or put it another way: asynchronous > systems have to deal with conflicting, but already committed > transactions - Postgres-R does not. I find that line fuzzy. It's synchronous for the reason you describe, but it's asynchronous because a query that has returned successfully is not _guaranteed_ to be committed everywhere yet. Seems like we're dealing with a limitation in the terminology :) > Certainly, this is slightly less restrictive than saying that a > transaction needs to be *committed* on all nodes, before confirming the > commit to the client. But as long as a database session is tied to a > node, this optimization does not alter any transactional semantics. And > despite that limitation, which is mostly the case in reality anyway, I > still consider this to be synchronous replication. This could potentially be a problem on (for example) a web application, where a particular user's experience may be load-balanced to another node at any time. Of course, you just have to write the application with that knowledge. > [ To get a strictly synchronous system with Postgres-R, you'd have to > delay read only transactions on a node which hasn't applied all remote > transactions, yet. In most cases, that's unwanted. Instead, a consistent > snapshot is enough, just as if the transaction started *before* the > remote ones which still need to be applied. ] Agreed. > > BTW: does anyone know of a link that describes these high-level concepts? > > If not, I think I'll write this up formally and post it. > > Hm.. somewhen before 8.3 was released, we had lots of discussions on > -docs about the "high availability and replication" section of the > PostgreSQL documentation. I'd have liked to add these fundamental > concepts, but Bruce - rightly - wanted to keep focused on existing > solutions. And unfortunately, most existing solutions are async, > single-master. So explaining all these wonderful theoretic concepts only > to state that there are no real solutions would have been silly. Someone else posted a link, and the docs look pretty comprehensive at this point ... enough so that I'm not going to bother writing up my own explanation. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Hi, Bill Moran wrote: First off, "clustering" is a word that is too vague to be useful, so I'll stop using it. There's multi-master replication, where every database is read-write, then there's master-slave replication, where only one server is read-write and the rest are read-only. You can add failover capabilities to master-slave replication. Then there's synchronous replication, where all servers are guaranteed to get updates at the same time. And asynchronous replication, where other servers may take a while to get updates. These descriptions aren't really specific to PostgreSQL -- every database replication system has to make design decisions about which approaches to support. Good explanation! Synchronous replication is only really used when two servers are right next to each other with a high-speed link (probably gigabit) between them. Why is that so? There's certainly very valuable data which would gain from an inter-continental database system. For money transfers, for example, I'd rather wait half a second for a round trip around the world, to make sure the RDBS does not 'loose' my money. PostgreSQL-R is in development, and targeted to allow multi-master, asynchronous replication without rewriting your application. As far as I know, it works, but it's still beta. Sorry, this is nitpicking, but for some reason (see current naming discussion on -advocacy :-) ), it's "Postgres-R". Additionally, Postgres-R is considered to be a *synchronous* replication system, because once you get your commit confirmation, your transaction is guaranteed to be deliverable and *committable* on all running nodes (i.e. it's durable and consistent). Or put it another way: asynchronous systems have to deal with conflicting, but already committed transactions - Postgres-R does not. Certainly, this is slightly less restrictive than saying that a transaction needs to be *committed* on all nodes, before confirming the commit to the client. But as long as a database session is tied to a node, this optimization does not alter any transactional semantics. And despite that limitation, which is mostly the case in reality anyway, I still consider this to be synchronous replication. [ To get a strictly synchronous system with Postgres-R, you'd have to delay read only transactions on a node which hasn't applied all remote transactions, yet. In most cases, that's unwanted. Instead, a consistent snapshot is enough, just as if the transaction started *before* the remote ones which still need to be applied. ] BTW: does anyone know of a link that describes these high-level concepts? If not, I think I'll write this up formally and post it. Hm.. somewhen before 8.3 was released, we had lots of discussions on -docs about the "high availability and replication" section of the PostgreSQL documentation. I'd have liked to add these fundamental concepts, but Bruce - rightly - wanted to keep focused on existing solutions. And unfortunately, most existing solutions are async, single-master. So explaining all these wonderful theoretic concepts only to state that there are no real solutions would have been silly. Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Bill Moran <[EMAIL PROTECTED]> writes: First off, "clustering" is a word that is too vague to be useful, so I'll stop using it. Right. MySQL Cluster, on the other hand, is a very specific technology. http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster.html It is, however, capable of being d*mn fast for read-mostly workloads that can fit their whole dataset into RAM --- and with the price of There are however some things that won't work (or work well) using NDB that will drive you crazy. VARCHAR's aren't varchars. They are fixed to the max length. There's also a limit on overall row length which is pretty small (don't remember what it is off hand). Cluster doesn't really enjoy processing queries with left outer joins or joins in general -- what will take <1s on a single mysql instance can take several seconds on the cluster. Some of this is because the storage nodes can't do it so copy all the tables involved to the api nodes for processing. Even on a fast network this takes a lot of time. You can't have a query with two OR'd LIKE clauses. Instead you have to break them into their own query and UNION the result. You can't insert/update/delete more than 32000 rows at a time. In practice (and no I don't understand why) sometimes this really means more like 1. Most annoying however is that to make a change to the database schema you have to shut down all the nodes except one. Not sure if this is typical of other systems or not, but it kind of sucks :/ There's other things too, but I don't remember what they are until I build something that works fine with a single mysql instance and then doesn't on the cluster... -philip ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
On Aug 27, 2007, at 11:04 AM, Andrew Sullivan wrote: It was a way to scale many small systems for certain kinds of workloads. My impression is that in most cases, it's a SQL-ish solution to a problem where someone decided to use the SQL nail because that's the hammer they had. I can think of ways you could The underlying table type can be used directly, similarly to how the "DB" table type is built on the DB library. The issue is whether you can do that without some strange licensing problems. I wouldn't use it for "permanent" data. It would be awesome for a cache of the "hot" data in your app, and perhaps even for realized views that speed up your interactive app. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
On Sat, Aug 25, 2007 at 11:13:45AM -0400, Tom Lane wrote: > In case you hadn't noticed the disconnect between these statements: > if they have to be that close together, there *will* be a single point > of failure. Fire in your data center, for instance, will take out every > copy of your data. So as a "high availability" solution I don't find > it all that compelling. Indeed. There's another issue, too, which you have to spend some time reading the manual to get. The clustering stuff is _yet another_ table type, with subtly different semantics from other table types. As usual, this means that you can blow off your foot by mixing table types in a transaction. As near as I can tell, the cluster table type (I disremeber the name of it) cannot be run in strict mode, either. To answer the OP's question, you can do some "cluster-like" things by doing hardware clustering -- two machines attached to a RAID with some sort of hardware fail-over in place. I think that the MySQL cluster stuff wasn't intended as an HA feature, though (although they might well be selling it that way). It was a way to scale many small systems for certain kinds of workloads. My impression is that in most cases, it's a SQL-ish solution to a problem where someone decided to use the SQL nail because that's the hammer they had. I can think of ways you could use it, and I'm not surprised that Some Giant Corp is doing so. But I'd be astonished if someone used it for truly valuable data. I would think very hard about the qualifications of someone who proposed using it for financial data. A -- Andrew Sullivan | [EMAIL PROTECTED] Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
On Aug 25, 2007, at 8:12 AM, Phoenix Kiula wrote: The sentence that caught my attention is "Nokia, Alcatel and Nortel are all building real-time network nodes on top of MySQL Cluster." My experiences with MySQL so far have been less than exhilerating (only tried it for our web stuff, which is not much so far but increasingly rapidly) but I have seen a lot of talk about MySQL clusters. Is there something similar in the PG world? PG Clusters? MySQL Cluster is a specific product. It is not just mysql databases lumped together to make a big mysql DB. It is a memory-based database that requires at least 2 machines to run, and is not usefully redundant and distributed until you have many machines. The telco's use it for their logging and switching infrastructure where you need 100% uptime and wicked fast response (thus memory based.) And you'll note it was developed by Ericsson... There is no equivalent in Postgres. There are several replication choices for Postgres. Google will find them for you, or just look on the postgres.org pages for the list. We use slony1 and it works very well for our need. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
On Sat, Aug 25, 2007 at 10:18:25AM -0400, Bill Moran wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> wrote: > > > > We're moving from MySQL to PG, a move I am rather enjoying, but > > we're currently running both databases. As we web-enable our > > financial services in fifteen countries, I would like to recommend > > the team that we move entirely to PG. > > > > In doing research on big installations of the two databases, I > > read this from a MySQL senior exec on Slashdot: > > Senior MySQL exec means this is a marketing blurb, which means it's > exaggerated, lacking any honest assessment of challenges and > difficulties, and possibly an outright lie. I've no doubt that > MySQL can do clusters if you know what you're doing, but if you want > the facts, you're going to have to look deeper than that obviously > biased quote. I seem to remember a forum thread with someone having > considerable difficulty with MySQL cluster, and actual MySQL > employees jumping in to try to help and no solution ever found. > Anyone have that link lying around? I think this is the one. http://forums.mysql.com/read.php?25,93181,93181 Cheers, David. -- David Fetter <[EMAIL PROTECTED]> http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Bill Moran wrote: BTW: does anyone know of a link that describes these high-level concepts? If not, I think I'll write this up formally and post it. Chapter 24 - http://www.postgresql.org/docs/8.2/interactive/high-availability.html is a recent addition to the manual that starts to explain most of these terms. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Bill Moran <[EMAIL PROTECTED]> writes: > First off, "clustering" is a word that is too vague to be useful, so > I'll stop using it. Right. MySQL Cluster, on the other hand, is a very specific technology. http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster.html It's interesting but far from an all-purpose solution. I am no expert but I believe these are true statements: * The data is all kept in memory, with *no* on-disk backup. This is why it's so fast. As for reliability, the rationale is that if you have enough replicated nodes then you'll never lose all of them at once. * It's a synchronous multi-master system, meaning that performance is only good if there is very low communication delay between the nodes. Not only do they have to be all on the same LAN segment, MySQL says it had better be at least 100MB Ethernet. In case you hadn't noticed the disconnect between these statements: if they have to be that close together, there *will* be a single point of failure. Fire in your data center, for instance, will take out every copy of your data. So as a "high availability" solution I don't find it all that compelling. It is, however, capable of being d*mn fast for read-mostly workloads that can fit their whole dataset into RAM --- and with the price of RAM what it is, that's not such a big limitation, though remember you need N servers with that much RAM not only one. (For write-mostly workloads I imagine it doesn't scale out so well, because of the costs of updating multiple servers synchronously.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
"Phoenix Kiula" <[EMAIL PROTECTED]> wrote: > > We're moving from MySQL to PG, a move I am rather enjoying, but we're > currently running both databases. As we web-enable our financial > services in fifteen countries, I would like to recommend the team that > we move entirely to PG. > > In doing research on big installations of the two databases, I read > this from a MySQL senior exec on Slashdot: Senior MySQL exec means this is a marketing blurb, which means it's exaggerated, lacking any honest assessment of challenges and difficulties, and possibly an outright lie. I've no doubt that MySQL can do clusters if you know what you're doing, but if you want the facts, you're going to have to look deeper than that obviously biased quote. I seem to remember a forum thread with someone having considerable difficulty with MySQL cluster, and actual MySQL employees jumping in to try to help and no solution ever found. Anyone have that link lying around? In any event, replication is a large and complex topic. To do it well takes research, planning, and know-how. Anyone who tells you their solution will just drop in and work is either lying or charging you a bunch of money for their consultants to investigate your scenario and set it up for you. First off, "clustering" is a word that is too vague to be useful, so I'll stop using it. There's multi-master replication, where every database is read-write, then there's master-slave replication, where only one server is read-write and the rest are read-only. You can add failover capabilities to master-slave replication. Then there's synchronous replication, where all servers are guaranteed to get updates at the same time. And asynchronous replication, where other servers may take a while to get updates. These descriptions aren't really specific to PostgreSQL -- every database replication system has to make design decisions about which approaches to support. PostgreSQL has some built-in features to allow synchronous multi-master database replication. Two-phase commit allows you to reliably commit transactions to multiple servers concurrently, but it requires support at the application level, which will require you to rewrite any existing applications. Pgcluster is multi-master synchronous replication, but I believe it's still in beta. Note that no synchronous replication system works well over geographically large distances. The time required for the masters to synchronize over (for example) the Internet kills performance to the point of uselessness. Again, this is not a PostgreSQL problem, MSSQL suffers the same problem. Synchronous replication is only really used when two servers are right next to each other with a high-speed link (probably gigabit) between them. PostgreSQL-R is in development, and targeted to allow multi-master, asynchronous replication without rewriting your application. As far as I know, it works, but it's still beta. pgpool supports multi-master synchronous replication as well as failover. Slony supports master-slave asynchronous replication and works _very_ well over long distances (such as from an east coast to a west coast datacenter) Once you've looked at your requirements, start looking at the tool that matches those requirements, and I think you'll find what you need. BTW: does anyone know of a link that describes these high-level concepts? If not, I think I'll write this up formally and post it. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
We're moving from MySQL to PG, a move I am rather enjoying, but we're currently running both databases. As we web-enable our financial services in fifteen countries, I would like to recommend the team that we move entirely to PG. In doing research on big installations of the two databases, I read this from a MySQL senior exec on Slashdot: QUOTE--- "...We did it ourselves with MySQL Cluster. That's technology originally developed by Ericsson - and today it is an absolute category leader in telco and networking. Nokia, Alcatel and Nortel are all building real-time network nodes on top of MySQL Cluster. No one else has a main-memory based shared-nothing cluster with that high throughput and availability. Or take the new Monitoring and Advisory Service that we are launching as part of MySQL Enterprise - this is a novel innovatoin built on the feedback from our most advanced users and customers." --UNQUOTE The sentence that caught my attention is "Nokia, Alcatel and Nortel are all building real-time network nodes on top of MySQL Cluster." My experiences with MySQL so far have been less than exhilerating (only tried it for our web stuff, which is not much so far but increasingly rapidly) but I have seen a lot of talk about MySQL clusters. Is there something similar in the PG world? PG Clusters? Google search comes up (apart from the "cluster" command) with stuff like this in the mailing list archives -- http://snipr.com/1pxkf -- which I am not sure is very heartening piece of news. I'd appreciate any thoughts or experiences, or starting points for what I should start reading in terms of PG in high availability situations, including replication and whatnot. Am I speaking of something like "EnterpriseDB" (is that Postgresql with tech support, or a different DB?) TIA! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly