I updated the original patch to 1) skip copydir() if either src path or dst parent path is missing in dbase_redo(). Both missing cases seem to be possible. For the src path missing case, mkdir_p() is meaningless. It seems that moving the directory existence check step to dbase_redo() has less impact on other code.
2) Fixed dbase_desc(). Now the xlog output looks correct. rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn: 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to pg_tblspc/16384/PG_12_201904281/16386 rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn: 0/01638EB8, prev 0/01638E40, desc: DROP dir pg_tblspc/16384/PG_12_201904281/16386 I'm not familiar with the TAP test details previously. I learned a lot about how to test such case from Kyotaro's patch series.👍 On Sun, Apr 28, 2019 at 3:33 PM Paul Guo <p...@pivotal.io> wrote: > > On Wed, Apr 24, 2019 at 4:14 PM Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co.jp> wrote: > >> Mmm. I posted to wrong thread. Sorry. >> >> At Tue, 23 Apr 2019 16:39:49 +0900 (Tokyo Standard Time), Kyotaro >> HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in < >> 20190423.163949.36763221.horiguchi.kyot...@lab.ntt.co.jp> >> > At Tue, 23 Apr 2019 13:31:58 +0800, Paul Guo <p...@pivotal.io> wrote >> in <caeet0zecwz57z2yfwrds43b3tfqppdswmbjgmd43xrxlt41...@mail.gmail.com> >> > > Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this >> database >> > > create redo error, but I suspect some other kind of redo, which >> depends on >> > > the files under the directory (they are not copied since the >> directory is >> > > not created) and also cannot be covered by the invalid page mechanism, >> > > could fail. Thanks. >> > >> > If recovery starts from just after tablespace creation, that's >> > simple. The Symlink to the removed tablespace is already removed >> > in the case. Hence server innocently create files directly under >> > pg_tblspc, not in the tablespace. Finally all files that were >> > supposed to be created in the removed tablespace are removed >> > later in recovery. >> > >> > If recovery starts from recalling page in a file that have been >> > in the tablespace, XLogReadBufferExtended creates one (perhaps >> > directly in pg_tblspc as described above) and the files are >> > removed later in recoery the same way to above. This case doen't >> > cause FATAL/PANIC during recovery even in master. >> > >> > XLogReadBufferExtended@xlogutils.c >> > | * Create the target file if it doesn't already exist. This lets us >> cope >> > | * if the replay sequence contains writes to a relation that is later >> > | * deleted. (The original coding of this routine would instead >> suppress >> > | * the writes, but that seems like it risks losing valuable data if the >> > | * filesystem loses an inode during a crash. Better to write the data >> > | * until we are actually told to delete the file.) >> > >> > So buffered access cannot be a problem for the reason above. The >> > remaining possible issue is non-buffered access to files in >> > removed tablespaces. This is what I mentioned upthread: >> > >> > me> but I haven't checked this covers all places where need the same >> > me> treatment. >> >> RM_DBASE_ID is fixed by the patch. >> >> XLOG/XACT/CLOG/MULTIXACT/RELMAP/STANDBY/COMMIT_TS/REPLORIGIN/LOGICALMSG: >> - are not relevant. >> >> HEAP/HEAP2/BTREE/HASH/GIN/GIST/SEQ/SPGIST/BRIN/GENERIC: >> - Resources works on buffer is not affected. >> >> SMGR: >> - Both CREATE and TRUNCATE seems fine. >> >> TBLSPC: >> - We don't nest tablespace directories. No Problem. >> >> I don't find a similar case. > > > I took some time in digging into the related code. It seems that ignoring > if the dst directory cannot be created directly > should be fine since smgr redo code creates tablespace path finally by > calling TablespaceCreateDbspace(). > What's more, I found some more issues. > > 1) The below error message is actually misleading. > > 2019-04-17 14:52:14.951 CST [23030] FATAL: could not create directory > "pg_tblspc/65546/PG_12_201904072/65547": No such file or directory > 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for > Database/CREATE: copy dir 1663/1 to 65546/65547 > > That should be due to dbase_desc(). It could be simply fixed following the > code logic in GetDatabasePath(). > > 2) It seems that src directory could be missing then > dbase_redo()->copydir() could error out. For example, > > \!rm -rf /tmp/tbspace1 > \!mkdir /tmp/tbspace1 > \!rm -rf /tmp/tbspace2 > \!mkdir /tmp/tbspace2 > create tablespace tbs1 location '/tmp/tbspace1'; > create tablespace tbs2 location '/tmp/tbspace2'; > create database db1 tablespace tbs1; > alter database db1 set tablespace tbs2; > drop tablespace tbs1; > > Let's say, the standby finishes all replay but redo lsn on pg_control is > still the point at 'alter database', and then > kill postgres, then in theory when startup, dbase_redo()->copydir() will > ERROR since 'drop tablespace tbs1' > has removed the directories (and symlink) of tbs1. Below simple code > change could fix that. > > diff --git a/src/backend/commands/dbcommands.c > b/src/backend/commands/dbcommands.c > index 9707afabd9..7d755c759e 100644 > --- a/src/backend/commands/dbcommands.c > +++ b/src/backend/commands/dbcommands.c > @@ -2114,6 +2114,15 @@ dbase_redo(XLogReaderState *record) > */ > FlushDatabaseBuffers(xlrec->src_db_id); > > + /* > + * It is possible that the source directory is missing if > + * we are re-replaying the xlog while subsequent xlogs > + * drop the tablespace in previous replaying. For this > + * we just skip. > + */ > + if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode))) > + return; > + > /* > * Copy this subdirectory to the new location > * > > If we want to fix the issue by ignoring the dst path create failure, I do > not think we should do > that in copydir() since copydir() seems to be a common function. I'm not > sure whether it is > used by some extensions or not. If no maybe we should move the dst patch > create logic > out of copydir(). > > Also I'd suggest we should use pg_mkdir_p() in TablespaceCreateDbspace() > to replace > the code block includes a lot of > get_parent_directory(), MakePGDirectory(), etc even it > is not fixing a bug since pg_mkdir_p() code change seems to be more > graceful and simpler. > > Whatever ignore mkdir failure or mkdir_p, I found that these steps seem to > be error-prone > along with postgre evolving since they are hard to test and also we are > not easy to think out > various potential bad cases. Is it possible that we should do real > checkpoint (flush & update > redo lsn) when seeing checkpoint xlogs for these operations? This will > slow down standby > but master also does this and also these operations are not usual, > espeically it seems that it > does not slow down wal receiving usually? > > > >
v2-0001-skip-copydir-if-either-src-directory-or-dst-direc.patch
Description: Binary data