On Mon, May 27, 2019 at 9:39 PM Paul Guo <p...@pivotal.io> wrote:

>
>
> On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
>
>> Hello.
>>
>> At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <p...@pivotal.io> wrote in <
>> caeet0zf9yn4daxyuflzocayyxuff1ms_oqwea+rwv3gha5q...@mail.gmail.com>
>> > Thanks for the reply.
>> >
>> > On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
>> > horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> >
>> > >
>> > > +      if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
>> > > +      {
>> > >
>> > > This patch is allowing missing source and destination directory
>> > > even in consistent state. I don't think it is safe.
>> > >
>> >
>> > I do not understand this. Can you elaborate?
>>
>> Suppose we were recoverying based on a backup at LSN1 targeting
>> to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
>> is called as "consistency point", before where the database is
>> not consistent. It's because we are applying WAL recored older
>> than those that were already applied in the second trial. The
>> same can be said for crash recovery, where LSN1 is the latest
>> checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.
>>
>> Creation of an existing directory or dropping of a non-existent
>> directory are apparently inconsistent or "broken" so we should
>> stop recovery when seeing such WAL records while database is in
>> consistent state.
>>
>
> This seems to be hard to detect. I thought using invalid_page mechanism
> long ago,
> but it seems to be hard to fully detect a dropped tablespace.
>
> > > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
>> > > >
>> > > > rmgr: Database    len (rec/tot):     42/    42, tx:        486, lsn:
>> > > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
>> > > > pg_tblspc/16384/PG_12_201904281/16386
>> > > >
>> > > > rmgr: Database    len (rec/tot):     34/    34, tx:        487, lsn:
>> > > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
>> > > > pg_tblspc/16384/PG_12_201904281/16386
>> > >
>> > > WAL records don't convey such information. The previous
>> > > description seems right to me.
>> > >
>> >
>> > 2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
>> > Database/CREATE: copy dir 1663/1 to 65546/65547
>> > The directories are definitely wrong and misleading.
>>
>> The original description is right in the light of how the server
>> recognizes. The record exactly says that "copy dir 1663/1 to
>> 65546/65547" and the latter path is converted in filesystem layer
>> via a symlink.
>>
>
> In either $PG_DATA/pg_tblspc or symlinked real tablespace directory,
> there is an additional directory like PG_12_201905221 between
> tablespace oid and database oid. See the directory layout as below,
> so the directory info in xlog dump output was not correct.
>
> $ ls -lh data/pg_tblspc/
>
>
> total 0
>
>
> lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2
>
>
> $ ls -lh /tmp/2
>
>
> total 0
>
>
> drwx------. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221
>
>>
>>
>> > > > > Also I'd suggest we should use pg_mkdir_p() in
>> > > TablespaceCreateDbspace()
>> > > > > to replace
>> > > > > the code block includes a lot of
>> > > > > get_parent_directory(), MakePGDirectory(), etc even it
>> > > > > is not fixing a bug since pg_mkdir_p() code change seems to be
>> more
>> > > > > graceful and simpler.
>> > >
>> > > But I don't agree to this. pg_mkdir_p goes above two-parents up,
>> > > which would be unwanted here.
>> > >
>> > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
>> > This change just makes the code concise. Though in theory the change is
>> not
>> > needed.
>>
>> We don't want to create tablespace direcotory after concurrent
>> DROPing, as the comment just above is saying:
>>
>> |  * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
>> |  * or TablespaceCreateDbspace is running concurrently.
>>
>> If the concurrent DROP TABLESPACE destroyed the grand parent
>> directory, we mustn't create it again.
>>
>
> Yes, this is a good reason to keep the original code. Thanks.
>
> By the way, based on your previous test patch I added another test which
> could easily detect
> the missing src directory case.
>
>

I updated the patch to v3. In this version, we skip the error if copydir
fails due to missing src/dst directory,
but to make sure the ignoring is legal, I add a simple log/forget mechanism
(Using List) similar to the xlog invalid page
checking mechanism. Two tap tests are included. One is actually from a
previous patch by Kyotaro in this
email thread and another is added by me. In addition, dbase_desc() is fixed
to make the message accurate.

Thanks.

Attachment: v3-0001-skip-copydir-if-either-src-directory-or-dst-direc.patch
Description: Binary data

Reply via email to