* Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Jan 17, 2017 at 4:46 PM, Stephen Frost <sfr...@snowman.net> wrote: > >> But what if we're restarting after, say, rebooting? Then there's > >> nobody to see the progress messages, perhaps. The system just seems > >> to take an eternity to return to the usual runlevel. > > > > Not unlike an fsck. > > Right. That's why people developed journaled filesystems like ext3 > and ext4 - because waiting for increasingly-large disks to be checked > for errors sucked. And that made fsck times vastly lower and everyone > said "huzzah". Because waiting for things to happen stinks, and > people want to do as little of it as is reasonably possible.
Sure, but they still have a recovery process that they go through when recovering from a crash, just as we do, and those things which are waiting for the filesystem have to wait until it is. If a PG user has an issue with waiting for recovery to finish then they should make checkpoints happen more often (typically by reducing checkpoint_timeout...), so that we don't have as much to replay through since the last one. Just as a user could reduce the journal size of ext4 if they're worried that it'll take too long for the system to replay the last set of journaled entires during recovery after a crash. > >> I saw the discussion on this thread, but I didn't realize that it > >> meant that pg_ctl was going to wait for crash recovery, let alone > >> archive recovery. That seems not good. > > > > I disagree. The database isn't done starting up until it's gone through > > recovery. If there are other bits of the system which are depending on > > the database being online, shouldn't they wait until it's actually > > online to be started? > > They aren't necessarily depending on the database; they could be > entirely unrelated. Not in modern boot systems today... If they aren't depending on the database then they can get started as soon as everything they *do* depend on is up and running. Those daemons or what-have-you which depend on the database say so through the init dependency system. > > Admittedly, such processes should probably be prepared to try > > reconnecting to the database on a failure, but I don't see this as > > really all that different from how a journaling filesystem operates. > > A journaling filesystem doesn't have a mode where it enters archive > recovery mode and stays there permanently leaving the system in an > unusable state. Now there I agree with you, whatever we're doing with pg_ctl here shouldn't mean that it never returns, but is that actually what happens with pg_ctl --wait? If so, then that's what is wrong, not this particular patch which is just making --wait the default. If I'm understanding your concern correctly, you're worried about the case of a cold standby where the database is only replaying WAL but not configured to come up as a hot standby and therefore PQping() won't ever succeed? Except, that isn't what would ever happen because the timeout for the --wait option is 60s, according to the pg_ctl docs anyway, after which it'll throw an error and say the server didn't start up, even if it would have after a few minutes. One could wonder why we have the default set to a value lower than checkpoint_timeout, making it entirely likely that the database recovery would take longer than the timeout on a busy/high-volume server that's actually checkpointing on-time, but just barely. Perhaps we need a way for pg_ctl to realize a cold-standby case and throw an error or warning if --wait is specified then, but that hardly seems like the common use-case. It also wouldn't make any sense to have anything in the init system which depended on PG being up in such a case because, well, PG isn't ever going to be 'up'. Thanks! Stephen
signature.asc
Description: Digital signature