Hi all, I have a general question on scaling PostgreSQL for unlimited throughput, based on some experience.

TL;DR: My question is: given that the work-load on any secondary/standby database server is almost the same as that of the master database server, is there any point to bother with PgPool-II to route query activity over the hot standby's, is it instead not better to just increase the power of the master database system? Is there any trick that can really get to massive scalability with the database?

Background: Let's say we're building a massive NSA citizens surveillance system where we process every call and every email of every person on earth to find dissidents and link this to their financial transactions travel logs like airline bookings and Uber rides, to find some bogus charges that FBI agents could use to put any dissident in prison as soon as possible. And so we need a system that infinitely scales.

OK, I'm kidding (but keep thinking of adverse effects of our IT work) but the point is a massive data processing system. We know that parallelizing all work flows is the key to keep up. Processing the speech and emails and messages and other transactions is mostly doable by throwing hardware at it to parallelize the workflows.

There's always going to be one common bottleneck: that database.

In my experience of parallelizing workflow processes, I can hammer my PostgreSQL database and all I can do to keep up with that is broadening the IO pipeline and looking at the balance of CPU and IO to make sure that it's balanced at near 100% and as I add more CPU bandwidth I add more IO bandwidth and so on to keep those gigabytes flowing and the CPUs churning. But it's much harder with the database than with the message transformations (natural language understanding, data extraction, image processing, etc.)

I have set up a hot standby database which I thought would just keep track with the master, and which I could use to run queries while the insert, update, and delete operations would all go against the master db. What I discovered is that the stress on the hot standby systems is significant just to keep up! The replaying of these logs takes significant resources, so much that if I use a less powerful hardware for the secondary, it tends to fall behind and ultimately bails out because it cannot process the log stream.

So, if my secondary is so busy already with just keeping up to date with the master db, and I cannot use a significantly smaller hardware, how can I put a lot of extra query load on these secondary systems? My argument is GENERAL not "show me your schema", etc. I am talking about principles. I read it somewhere that you need to dimension these secondaries / standby servers about the same capacity as the master server. And that means that the standby servers are about as busy as the master server. And that means that as you scale this up, the scaling is actually quite inefficient. I have to copy all that data while the receiving end of all that data is as busy receiving this data as the master server is with processing the actual transactions.

Doesn't that mean that it's better to just scale up the master system as much as possible while the standby servers are only a means of fault tolerance but never actually improved performance? In other words there is no real benefit of running read/query-only workloads on the secondaries and routing updates to the primary, because the background workload is replicated with every standby server and is not significantly less than the workload on the master server.

And in other words, isn't there a way to replicate that is more efficient? Or are there hard limits? Again, I'm talking principles.

For example, if I just make exact disk copies of the data tables on the SCSI bus level (like RAID-1) for block write transactions while I distribute the block read transactions over the RAID-1 spindles, again, most of my disks are still occupied with the write transactions because they all must write everything while I can distribute only the read activity. I suppose I can use some tricks to avoid seek time by scheduling reads to those disks that are currently writing to the same cylinder (I know that's moot with SSDs but there is some locality issues even for DDR RAM access, so the principle still holds). I suppose I could tweak the mirrors, to track the master write with a slight delay so as to allow some potential to re-organize blocks so as to write contiguous blocks or blocks that go to the same track. But this type of write scheduling is what OSs do out of a cache.

So, my question is: isn't there any trick that can really get to massive scalability with the database? Should I even bother with PgPool-II to route query activity over hot standbys?  I can buy two boxes of n CPUs and disk volume to run as master and slave, or I can spend the same money to buy a single system with twice the CPU cores and a twice as wide IO path and disks. Why would I do anything other but to just increase that master db server?

regards,
-Gunther




Reply via email to