On 2020/05/12 9:42, Paul Guo wrote:
Hello hackers,

1. StartupXLOG() does fsync on the whole data directory early in the crash 
recovery. I'm wondering if we could skip some directories (at least the 
pg_log/, table directories) since wal, etc could ensure consistency.

I agree that we can skip log directory but I'm not sure if skipping
table directory is really safe. Also ISTM that we can skip the directories
that those contents are removed or zeroed during recovery,
for example, pg_snapshots, pg_substrans, etc.

Here is the related code.

       if (ControlFile->state != DB_SHUTDOWNED &&
           ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
       {
           RemoveTempXlogFiles();
           SyncDataDirectory();
       }

I have this concern since I saw an issue in a real product environment that the 
startup process needs 10+ seconds to start wal replay after relaunch due to 
elog(PANIC) (it was seen on postgres based product Greenplum but it is a common 
issue in postgres also). I highly suspect the delay was mostly due to this. 
Also it is noticed that on public clouds fsync is much slower than that on 
local storage so the slowness should be more severe on cloud. If we at least 
disable fsync on the table directories we could skip a lot of file fsync - this 
may save a lot of seconds during crash recovery.

2.  CheckPointTwoPhase()

This may be a small issue.

See the code below,

for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
     RecreateTwoPhaseFile(gxact->xid, buf, len);

RecreateTwoPhaseFile() writes a state file for a prepared transaction and does 
fsync. It might be good to do fsync for all files once after writing them, given 
the kernel is able to do asynchronous flush when writing those file contents. If 
the TwoPhaseState->numPrepXacts is large we could do batching to avoid the fd 
resource limit. I did not test them yet but this should be able to speed up 
checkpoint/restartpoint a bit.

Any thoughts?

It seems worth making the patch and measuring the performance improvement.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to