Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.

Gunther Schadow Wed, 23 Dec 2020 10:01:44 -0800

Hi all, I have a general question on scaling PostgreSQL for unlimitedthroughput, based on some experience.

TL;DR: My question is: given that the work-load on any secondary/standbydatabase server is almost the same as that of the master databaseserver, is there any point to bother with PgPool-II to route queryactivity over the hot standby's, is it instead not better to justincrease the power of the master database system? Is there any trickthat can really get to massive scalability with the database?

Background: Let's say we're building a massive NSA citizens surveillancesystem where we process every call and every email of every person onearth to find dissidents and link this to their financial transactionstravel logs like airline bookings and Uber rides, to find some boguscharges that FBI agents could use to put any dissident in prison as soonas possible. And so we need a system that infinitely scales.

OK, I'm kidding (but keep thinking of adverse effects of our IT work)but the point is a massive data processing system. We know thatparallelizing all work flows is the key to keep up. Processing thespeech and emails and messages and other transactions is mostly doableby throwing hardware at it to parallelize the workflows.


There's always going to be one common bottleneck: that database.

In my experience of parallelizing workflow processes, I can hammer myPostgreSQL database and all I can do to keep up with that is broadeningthe IO pipeline and looking at the balance of CPU and IO to make surethat it's balanced at near 100% and as I add more CPU bandwidth I addmore IO bandwidth and so on to keep those gigabytes flowing and the CPUschurning. But it's much harder with the database than with the messagetransformations (natural language understanding, data extraction, imageprocessing, etc.)

I have set up a hot standby database which I thought would just keeptrack with the master, and which I could use to run queries while theinsert, update, and delete operations would all go against the masterdb. What I discovered is that the stress on the hot standby systems issignificant just to keep up! The replaying of these logs takessignificant resources, so much that if I use a less powerful hardwarefor the secondary, it tends to fall behind and ultimately bails outbecause it cannot process the log stream.

So, if my secondary is so busy already with just keeping up to date withthe master db, and I cannot use a significantly smaller hardware, howcan I put a lot of extra query load on these secondary systems? Myargument is GENERAL not "show me your schema", etc. I am talking aboutprinciples. I read it somewhere that you need to dimension thesesecondaries / standby servers about the same capacity as the masterserver. And that means that the standby servers are about as busy as themaster server. And that means that as you scale this up, the scaling isactually quite inefficient. I have to copy all that data while thereceiving end of all that data is as busy receiving this data as themaster server is with processing the actual transactions.

Doesn't that mean that it's better to just scale up the master system asmuch as possible while the standby servers are only a means of faulttolerance but never actually improved performance? In other words thereis no real benefit of running read/query-only workloads on thesecondaries and routing updates to the primary, because the backgroundworkload is replicated with every standby server and is notsignificantly less than the workload on the master server.

And in other words, isn't there a way to replicate that is moreefficient? Or are there hard limits? Again, I'm talking principles.

For example, if I just make exact disk copies of the data tables on theSCSI bus level (like RAID-1) for block write transactions while Idistribute the block read transactions over the RAID-1 spindles, again,most of my disks are still occupied with the write transactions becausethey all must write everything while I can distribute only the readactivity. I suppose I can use some tricks to avoid seek time byscheduling reads to those disks that are currently writing to the samecylinder (I know that's moot with SSDs but there is some locality issueseven for DDR RAM access, so the principle still holds). I suppose Icould tweak the mirrors, to track the master write with a slight delayso as to allow some potential to re-organize blocks so as to writecontiguous blocks or blocks that go to the same track. But this type ofwrite scheduling is what OSs do out of a cache.

So, my question is: isn't there any trick that can really get to massivescalability with the database? Should I even bother with PgPool-II toroute query activity over hot standbys? I can buy two boxes of n CPUsand disk volume to run as master and slave, or I can spend the samemoney to buy a single system with twice the CPU cores and a twice aswide IO path and disks. Why would I do anything other but to justincrease that master db server?


regards,
-Gunther

Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.

Reply via email to