On Fri, Dec 5, 2025 at 5:30 AM MentionTheElephant < [email protected]> wrote:
> The central question is: What should I examine further to understand > why checkpoint processing becomes the dominant bottleneck during > restore, despite fsync=off, synchronous_commit=off, and excellent random write latency? > I don't see any evidence that checkpoint is the dominant bottleneck. Checkpoints are not (usually) meant to be fast. They pace themselves to finish in the allotted time, while causing the least amount of drama while they are underway. The whole system shouldn't freeze up while waiting for a checkpoint to run. > Checkpoints appear to > stall the entire restore process. > What makes it appear that way? Is it just that they take a long time to finish, or is there some other evidence you haven't described? > Example log excerpts show checkpoint write times consistently in the > range of 600–1100 seconds, with large numbers of buffers written (from > hundreds of thousands to over 1.6 million). You told it to shoot for checkpoint_timeout * checkpoint_completion_target = 3600 * 0.9 = 3240 seconds. If anything, it is weird that it takes so little time, compared to what you authorized it to take. As for the number of buffers, it writes as many as it needs to write to get the job done. Are the checkpoints driven by time, or by WAL volume? The time reported for the write phase includes not only time spent doing the writes, but also time spent sleeping so as to pace the checkpoint to finish on schedule. Perhaps those times should be reported separately. I think that you have misinterpreted the evidence you have on hand, and are now looking for an explanation for something which isn't true in the first place. What is the percent IO wait (called "wa") reported by `top` or `vmstat` while the restore is running? If you run with --verbose, you can see each step as it starts them. Which steps take the most time? Cheers, Jeff
