+1 to the idea. I don't see a reason why checkpointer has to do all of
that. Keeping checkpoint to minimal essential work helps servers recover
faster in the event of a crash.

RemoveOldXlogFiles is also an O(N) operation that can at least be avoided
during the end of recovery (CHECKPOINT_END_OF_RECOVERY) checkpoint. When a
sufficient number of WAL files accumulated and the previous checkpoint did
not get a chance to cleanup, this can increase the unavailability of the
server.

    RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);



On Wed, Dec 1, 2021 at 12:24 PM Bossart, Nathan <bossa...@amazon.com> wrote:

> Hi hackers,
>
> Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
> avoid individually syncing all database files after a crash.  However,
> as noted earlier this year [0], there are still a number of O(n) tasks
> that affect startup and checkpointing that I'd like to improve.
> Below, I've attempted to summarize each task and to offer ideas for
> improving matters.  I'll likely split each of these into its own
> thread, given there is community interest for such changes.
>
> 1) CheckPointSnapBuild(): This function loops through
>    pg_logical/snapshots to remove all snapshots that are no longer
>    needed.  If there are many entries in this directory, this can take
>    a long time.  The note above this function indicates that this is
>    done during checkpoints simply because it is convenient.  IIUC
>    there is no requirement that this function actually completes for a
>    given checkpoint.  My current idea is to move this to a new
>    maintenance worker.
> 2) CheckPointLogicalRewriteHeap(): This function loops through
>    pg_logical/mappings to remove old mappings and flush all remaining
>    ones.  IIUC there is no requirement that the "remove old mappings"
>    part must complete for a given checkpoint, but the "flush all
>    remaining" portion allows replay after a checkpoint to only "deal
>    with the parts of a mapping that have been written out after the
>    checkpoint started."  Therefore, I think we should move the "remove
>    old mappings" part to a new maintenance worker (probably the same
>    one as for 1), and we should consider using syncfs() for the "flush
>    all remaining" part.  (I suspect the main argument against the
>    latter will be that it could cause IO spikes.)
> 3) RemovePgTempFiles(): This step can delay startup if there are many
>    temporary files to individually remove.  This step is already
>    optionally done after a crash via the remove_temp_files_after_crash
>    GUC.  I propose that we have startup move the temporary file
>    directories aside and create new ones, and then a separate worker
>    (probably the same one from 1 and 2) could clean up the old files.
> 4) StartupReorderBuffer(): This step deletes logical slot data that
>    has been spilled to disk.  This code appears to be written to avoid
>    deleting different types of files in these directories, but AFAICT
>    there shouldn't be any other files.  Therefore, I think we could do
>    something similar to 3 (i.e., move the directories aside during
>    startup and clean them up via a new maintenance worker).
>
> I realize adding a new maintenance worker might be a bit heavy-handed,
> but I think it would be nice to have somewhere to offload tasks that
> really shouldn't impact startup and checkpointing.  I imagine such a
> process would come in handy down the road, too.  WDYT?
>
> Nathan
>
> [0] https://postgr.es/m/32B59582-AA6C-4609-B08F-2256A271F7A5%40amazon.com
>
>

Reply via email to