Greetings, * Thomas Munro (thomas.mu...@gmail.com) wrote: > On Thu, Nov 19, 2020 at 10:00 AM Stephen Frost <sfr...@snowman.net> wrote: > > * Thomas Munro (thomas.mu...@gmail.com) wrote: > > > Hmm. Every time I try to think of a protocol change for the > > > restore_command API that would be acceptable, I go around the same > > > circle of thoughts about event flow and realise that what we really > > > need for this is ... a WAL receiver... > > > > A WAL receiver, or an independent process which goes out ahead and > > fetches WAL..? > > What I really meant was: why would you want this over streaming rep?
I have to admit to being pretty confused as to this question and maybe I'm just not understanding. Why wouldn't change patch be helpful for streaming replication too..? If I follow correctly, this patch will scan ahead in the WAL and let the kernel know that certain blocks will be needed soon. Ideally, though I don't think it does yet, we'd only do that for blocks that aren't already in shared buffers, and only for non-FPIs (even better if we can skip past pages for which we already, recently, passed an FPI). The biggest caveat here, it seems to me anyway, is that for this to actually help you need to be running with checkpoints that are larger than shared buffers, as otherwise all the pages we need will be in shared buffers already, thanks to FPIs bringing them in, except when running with hot standby, right? In the hot standby case, other random pages could be getting pulled in to answer user queries and therefore this would be quite helpful to minimize the amount of time required to replay WAL, I would think. Naturally, this isn't very interesting if we're just always able to keep up with the primary, but that's certainly not always the case. > I just noticed this thread proposing to retire pg_standby on that > basis: > > https://www.postgresql.org/message-id/flat/20201029024412.GP5380%40telsasoft.com > > I'd be happy to see that land, to fix this problem with my plan. But > are there other people writing restore scripts that block that would > expect them to work on PG14? Ok, I think I finally get the concern that you're raising here- basically that if a restore command was written to sit around and wait for WAL segments to arrive, instead of just returning to PG and saying "WAL segment not found", that this would be a problem if we are running out ahead of the applying process and asking for WAL. The thing is- that's an outright broken restore command script in the first place. If PG is in standby mode, we'll ask again if we get an error result indicating that the WAL file wasn't found. The restore command documentation is quite clear on this point: The command will be asked for file names that are not present in the archive; it must return nonzero when so asked. There's no "it can wait around for the next file to show up if it wants to" in there- it *must* return nonzero when asked for files that don't exist. So, I don't think that we really need to stress over this. The fact that pg_standby offers options to have it wait instead of just returning a non-zero error-code and letting the loop that we already do in the core code seems like it's really just a legacy thing from before we were doing that and probably should have been ripped out long ago... Even more reason to get rid of pg_standby tho, imv, we haven't been properly adjusting it when we've been making changes to the core code, it seems. Thanks, Stephen
signature.asc
Description: PGP signature