Thanks for the reply. On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) > + { > > This patch is allowing missing source and destination directory > even in consistent state. I don't think it is safe. > I do not understand this. Can you elaborate? > > > > + ereport(WARNING, > + (errmsg("directory \"%s\" for copydir() does not exists." > + "It is possibly expected. Skip copydir().", > + parent_path))); > > This message seems unfriendly to users, or it seems like an elog > message. How about something like this. The same can be said for > the source directory. > > | WARNING: skipped creating database directory: "%s" > | DETAIL: The tabelspace %u may have been removed just before crash. > Yeah. Looks better. > > # I'm not confident in this at all:( > > > 2) Fixed dbase_desc(). Now the xlog output looks correct. > > > > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn: > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to > > pg_tblspc/16384/PG_12_201904281/16386 > > > > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn: > > 0/01638EB8, prev 0/01638E40, desc: DROP dir > > pg_tblspc/16384/PG_12_201904281/16386 > > WAL records don't convey such information. The previous > description seems right to me. > 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for Database/CREATE: copy dir 1663/1 to 65546/65547 The directories are definitely wrong and misleading. > > > Also I'd suggest we should use pg_mkdir_p() in > TablespaceCreateDbspace() > > > to replace > > > the code block includes a lot of > > > get_parent_directory(), MakePGDirectory(), etc even it > > > is not fixing a bug since pg_mkdir_p() code change seems to be more > > > graceful and simpler. > > But I don't agree to this. pg_mkdir_p goes above two-parents up, > which would be unwanted here. > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'. This change just makes the code concise. Though in theory the change is not needed. > > > Whatever ignore mkdir failure or mkdir_p, I found that these steps > seem to > > > be error-prone > > > along with postgre evolving since they are hard to test and also we are > > > not easy to think out > > > various potential bad cases. Is it possible that we should do real > > > checkpoint (flush & update > > > redo lsn) when seeing checkpoint xlogs for these operations? This will > > > slow down standby > > > but master also does this and also these operations are not usual, > > > espeically it seems that it > > > does not slow down wal receiving usually? > > That dramatically slows recovery (not replication) if databases > are created and deleted frequently. That wouldn't be acceptable. > This behavior is rare and seems to have the same impact on master & standby from checkpoint/restartpoint. We do not worry about master so we should not worry about standby also.