Re: standby recovery fails (tablespace related) (tentative patch and discussion)

Paul Guo Mon, 29 Apr 2019 23:34:51 -0700

I updated the original patch to

1) skip copydir() if either src path or dst parent path is missing in
dbase_redo(). Both missing cases seem to be possible. For the src path
missing case, mkdir_p() is meaningless. It seems that moving the directory
existence check step to dbase_redo() has less impact on other code.


2) Fixed dbase_desc(). Now the xlog output looks correct.

rmgr: Database    len (rec/tot):     42/    42, tx:        486, lsn:
0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
pg_tblspc/16384/PG_12_201904281/16386

rmgr: Database    len (rec/tot):     34/    34, tx:        487, lsn:
0/01638EB8, prev 0/01638E40, desc: DROP dir
pg_tblspc/16384/PG_12_201904281/16386

I'm not familiar with the TAP test details previously. I learned a lot
about how to test such case from Kyotaro's patch series.👍

On Sun, Apr 28, 2019 at 3:33 PM Paul Guo <p...@pivotal.io> wrote:

>
> On Wed, Apr 24, 2019 at 4:14 PM Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
>
>> Mmm. I posted to wrong thread. Sorry.
>>
>> At Tue, 23 Apr 2019 16:39:49 +0900 (Tokyo Standard Time), Kyotaro
>> HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <
>> 20190423.163949.36763221.horiguchi.kyot...@lab.ntt.co.jp>
>> > At Tue, 23 Apr 2019 13:31:58 +0800, Paul Guo <p...@pivotal.io> wrote
>> in <caeet0zecwz57z2yfwrds43b3tfqppdswmbjgmd43xrxlt41...@mail.gmail.com>
>> > > Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this
>> database
>> > > create redo error, but I suspect some other kind of redo, which
>> depends on
>> > > the files under the directory (they are not copied since the
>> directory is
>> > > not created) and also cannot be covered by the invalid page mechanism,
>> > > could fail. Thanks.
>> >
>> > If recovery starts from just after tablespace creation, that's
>> > simple. The Symlink to the removed tablespace is already removed
>> > in the case. Hence server innocently create files directly under
>> > pg_tblspc, not in the tablespace. Finally all files that were
>> > supposed to be created in the removed tablespace are removed
>> > later in recovery.
>> >
>> > If recovery starts from recalling page in a file that have been
>> > in the tablespace, XLogReadBufferExtended creates one (perhaps
>> > directly in pg_tblspc as described above) and the files are
>> > removed later in recoery the same way to above. This case doen't
>> > cause FATAL/PANIC during recovery even in master.
>> >
>> > XLogReadBufferExtended@xlogutils.c
>> > | * Create the target file if it doesn't already exist.  This lets us
>> cope
>> > | * if the replay sequence contains writes to a relation that is later
>> > | * deleted.  (The original coding of this routine would instead
>> suppress
>> > | * the writes, but that seems like it risks losing valuable data if the
>> > | * filesystem loses an inode during a crash.  Better to write the data
>> > | * until we are actually told to delete the file.)
>> >
>> > So buffered access cannot be a problem for the reason above. The
>> > remaining possible issue is non-buffered access to files in
>> > removed tablespaces. This is what I mentioned upthread:
>> >
>> > me> but I haven't checked this covers all places where need the same
>> > me> treatment.
>>
>> RM_DBASE_ID is fixed by the patch.
>>
>> XLOG/XACT/CLOG/MULTIXACT/RELMAP/STANDBY/COMMIT_TS/REPLORIGIN/LOGICALMSG:
>>   - are not relevant.
>>
>> HEAP/HEAP2/BTREE/HASH/GIN/GIST/SEQ/SPGIST/BRIN/GENERIC:
>>   - Resources works on buffer is not affected.
>>
>> SMGR:
>>   - Both CREATE and TRUNCATE seems fine.
>>
>> TBLSPC:
>>   - We don't nest tablespace directories. No Problem.
>>
>> I don't find a similar case.
>
>
> I took some time in digging into the related code. It seems that ignoring
> if the dst directory cannot be created directly
> should be fine since smgr redo code creates tablespace path finally by
> calling TablespaceCreateDbspace().
> What's more, I found some more issues.
>
> 1) The below error message is actually misleading.
>
> 2019-04-17 14:52:14.951 CST [23030] FATAL:  could not create directory
> "pg_tblspc/65546/PG_12_201904072/65547": No such file or directory
> 2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
> Database/CREATE: copy dir 1663/1 to 65546/65547
>
> That should be due to dbase_desc(). It could be simply fixed following the
> code logic in GetDatabasePath().
>
> 2) It seems that src directory could be missing then
> dbase_redo()->copydir() could error out. For example,
>
> \!rm -rf /tmp/tbspace1
> \!mkdir /tmp/tbspace1
> \!rm -rf /tmp/tbspace2
> \!mkdir /tmp/tbspace2
> create tablespace tbs1 location '/tmp/tbspace1';
> create tablespace tbs2 location '/tmp/tbspace2';
> create database db1 tablespace tbs1;
> alter database db1 set tablespace tbs2;
> drop tablespace tbs1;
>
> Let's say, the standby finishes all replay but redo lsn on pg_control is
> still the point at 'alter database', and then
> kill postgres, then in theory when startup, dbase_redo()->copydir() will
> ERROR since 'drop tablespace tbs1'
> has removed the directories (and symlink) of tbs1. Below simple code
> change could fix that.
>
> diff --git a/src/backend/commands/dbcommands.c
> b/src/backend/commands/dbcommands.c
> index 9707afabd9..7d755c759e 100644
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -2114,6 +2114,15 @@ dbase_redo(XLogReaderState *record)
>          */
>         FlushDatabaseBuffers(xlrec->src_db_id);
>
> +       /*
> +        * It is possible that the source directory is missing if
> +        * we are re-replaying the xlog while subsequent xlogs
> +        * drop the tablespace in previous replaying. For this
> +        * we just skip.
> +        */
> +       if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
> +           return;
> +
>         /*
>          * Copy this subdirectory to the new location
>          *
>
> If we want to fix the issue by ignoring the dst path create failure, I do
> not think we should do
> that in copydir() since copydir() seems to be a common function. I'm not
> sure whether it is
> used by some extensions or not. If no maybe we should move the dst patch
> create logic
> out of copydir().
>
> Also I'd suggest we should use pg_mkdir_p() in TablespaceCreateDbspace()
> to replace
> the code block includes a lot of
> get_parent_directory(), MakePGDirectory(), etc even it
> is not fixing a bug since pg_mkdir_p() code change seems to be more
> graceful and simpler.
>
> Whatever ignore mkdir failure or mkdir_p, I found that these steps seem to
> be error-prone
> along with postgre evolving since they are hard to test and also we are
> not easy to think out
> various potential bad cases. Is it possible that we should do real
> checkpoint (flush & update
> redo lsn) when seeing checkpoint xlogs for these operations? This will
> slow down standby
> but master also does this and also these operations are not usual,
> espeically it seems that it
> does not slow down wal receiving usually?
>
>
>
>

v2-0001-skip-copydir-if-either-src-directory-or-dst-direc.patch
Description: Binary data

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

Reply via email to