Re: standby recovery fails (tablespace related) (tentative patch and discussion)

Paul Guo Mon, 13 May 2019 02:38:37 -0700

Thanks for the reply.

On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
[email protected]> wrote:


>
> +      if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> +      {
>
> This patch is allowing missing source and destination directory
> even in consistent state. I don't think it is safe.
>

I do not understand this. Can you elaborate?


>
>
>
> +        ereport(WARNING,
> +            (errmsg("directory \"%s\" for copydir() does not exists."
> +                "It is possibly expected. Skip copydir().",
> +                parent_path)));
>
> This message seems unfriendly to users, or it seems like an elog
> message. How about something like this. The same can be said for
> the source directory.
>
> | WARNING:  skipped creating database directory: "%s"
> | DETAIL:  The tabelspace %u may have been removed just before crash.
>

Yeah. Looks better.


>
> # I'm not confident in this at all:(
>
> > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> >
> > rmgr: Database    len (rec/tot):     42/    42, tx:        486, lsn:
> > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > pg_tblspc/16384/PG_12_201904281/16386
> >
> > rmgr: Database    len (rec/tot):     34/    34, tx:        487, lsn:
> > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > pg_tblspc/16384/PG_12_201904281/16386
>
> WAL records don't convey such information. The previous
> description seems right to me.
>

2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
Database/CREATE: copy dir 1663/1 to 65546/65547
The directories are definitely wrong and misleading.


> > > Also I'd suggest we should use pg_mkdir_p() in
> TablespaceCreateDbspace()
> > > to replace
> > > the code block includes a lot of
> > > get_parent_directory(), MakePGDirectory(), etc even it
> > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > graceful and simpler.
>
> But I don't agree to this. pg_mkdir_p goes above two-parents up,
> which would be unwanted here.
>
> I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
This change just makes the code concise. Though in theory the change is not
needed.


> > > Whatever ignore mkdir failure or mkdir_p, I found that these steps
> seem to
> > > be error-prone
> > > along with postgre evolving since they are hard to test and also we are
> > > not easy to think out
> > > various potential bad cases. Is it possible that we should do real
> > > checkpoint (flush & update
> > > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > > slow down standby
> > > but master also does this and also these operations are not usual,
> > > espeically it seems that it
> > > does not slow down wal receiving usually?
>
> That dramatically slows recovery (not replication) if databases
> are created and deleted frequently. That wouldn't be acceptable.
>

This behavior is rare and seems to have the same impact on master & standby
from checkpoint/restartpoint.
We do not worry about master so we should not worry about standby also.

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

Reply via email to