Hi, On 2021-07-30 10:16:44 -0400, Robert Haas wrote: > 2021-07-30 09:39:43.579 EDT [63702] LOG: redo starts at 0/14A2F48 > 2021-07-30 09:39:44.129 EDT [63702] LOG: redo done at 0/15F48230 > system usage: CPU: user: 0.25 s, system: 0.25 s, elapsed: 0.55 s > 2021-07-30 09:39:44.129 EDT [63702] LOG: crash recovery complete: > wrote 36517 buffers (222.9%); dirtied 52985 buffers; read 7 buffers > > Now I really think that information on the number of buffers touched > and how long it took is way more useful than user and system time. > Knowing how much user and system time were spent doesn't really tell > you anything, but a count of buffers touched gives you some meaningful > idea of how much work recovery did, and whether I/O was slow.
I don't agree with that? If (user+system) << wall then it is very likely that recovery is IO bound. If system is a large percentage of wall, then shared buffers is likely too small (or we're replacing the wrong buffers) because you spend a lot of time copying data in/out of the kernel page cache. If user is the majority, you're CPU bound. Without user & system time it's much harder to figure that out - at least for me. > In your patch, there's no end-of-recovery checkpoint -- you just > trigger a checkpoint instead of waiting for it. I think it's probably > better to make those two cases work the same. The end-of-recovery > record isn't needed to change the TLI as it is in the promotion case, > but (1) it seems better to have fewer code paths and (2) it might be > good for debuggability. +1 In addition, the end-of-recovery record also good for e.g. hot standby, logical decoding, etc, because it's a point where no write transactions can be in progress... Greetings, Andres Freund