On Tue, May 12, 2020 at 12:55:37PM +0900, Fujii Masao wrote: > On 2020/05/12 9:42, Paul Guo wrote: >> 1. StartupXLOG() does fsync on the whole data directory early in >> the crash recovery. I'm wondering if we could skip some >> directories (at least the pg_log/, table directories) since wal, >> etc could ensure consistency. > > I agree that we can skip log directory but I'm not sure if skipping > table directory is really safe. Also ISTM that we can skip the directories > that those contents are removed or zeroed during recovery, > for example, pg_snapshots, pg_substrans, etc.
Basically excludeDirContents[] as of basebackup.c. >> RecreateTwoPhaseFile() writes a state file for a prepared >> transaction and does fsync. It might be good to do fsync for all >> files once after writing them, given the kernel is able to do >> asynchronous flush when writing those file contents. If >> the TwoPhaseState->numPrepXacts is large we could do batching to >> avoid the fd resource limit. I did not test them yet but this >> should be able to speed up checkpoint/restartpoint a bit. > > It seems worth making the patch and measuring the performance improvement. You would need to do some micro-benchmarking here, so you could plug-in some pg_rusage_init() & co within this code path with many 2PC files present at the same time. However, I would believe that this is not really worth the potential code complications. -- Michael
signature.asc
Description: PGP signature