Thanks for the review, please see the replies below.

> On Jan 5, 2021, at 9:07 AM, Kyotaro Horiguchi <horikyota....@gmail.com> wrote:
> 
> At Wed, 8 Jul 2020 12:56:44 +0000, Paul Guo <gu...@vmware.com> wrote in 
>> On 2020/01/15 19:18, Paul Guo wrote:
>> I further fixed the last test failure (due to a small bug in the test, not 
>> in code). Attached are the new patch series. Let's see the CI pipeline 
>> result.
>> 
>> Thanks for updating the patches!
>> 
>> I started reading the 0003 patch.
>> 
>> The approach that the 0003 patch uses is not the perfect solution.
>> If the standby crashes after tblspc_redo() removes the directory and before
>> its subsequent COMMIT record is replayed, PANIC error would occur since
>> there can be some unresolved missing directory entries when we reach the
>> consistent state. The problem would very rarely happen, though...
>> Just idea; calling XLogFlush() to update the minimum recovery point just
>> before tblspc_redo() performs destroy_tablespace_directories() may be
>> safe and helpful for the problem?
> 
> It seems to me that what the current patch does is too complex.  What
> we need to do here is to remember every invalid operation then forget
> it when the prerequisite object is dropped.
> 
> When a table space is dropped before consistency is established, we
> don't need to care what has been performed inside the tablespace.  In
> this perspective, it is enough to remember tablespace ids when failed
> to do something inside it due to the absence of the tablespace and
> then forget it when we remove it.  We could remember individual
> database id to show them in error messages, but I'm not sure it's
> useful.  The reason log_invalid_page records block numbers is to allow
> the machinery handle partial table truncations, but this is not the
> case since dropping tablespace cannot leave some of containing
> databases.
> 
> As the result, we won't see an unresolved invalid operations in a
> dropped tablespace.
> 
> Am I missing something?

Yes, removing the database id from the hash key in the log/forget code should
be usually fine, but the previous code does stricter/safer checking.

Consider the scenario:

CREATE DATABASE newdb1 TEMPLATE template_db1;
CREATE DATABASE newdb2 TEMPLATE template_db2; <- in case the template_db2 
database directory is missing abnormally somehow.
DROP DATABASE template_db1;

The previous code could detect this but if we remove the database id in the 
code,
this bad scenario is skipped.

> 
> 
> dbase_redo:
> +      if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> +      {
> +        XLogRecordMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
> 
> This means "record the belonging table space directory if it is not
> found OR it is not a directory". The former can be valid but the
> latter is unconditionally can not (I don't think we bother considering
> symlinks there).

Again this is a safer check, in the case the parent_path is a file for example 
somehow,
we should panic finally for the case and let the user checks and then does 
recovery again.

> 
> +    /*
> +     * Source directory may be missing.  E.g. the template database used
> +     * for creating this database may have been dropped, due to reasons
> +     * noted above.  Moving a database from one tablespace may also be a
> +     * partner in the crime.
> +     */
> +    if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
> +    {
> +      XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, 
> src_path);
> 
> This is a part of *creation* of the target directory. Lack of the
> source directory cannot be valid even if the source directory is
> dropped afterwards in the WAL stream and we can allow that if the
> *target* tablespace is dropped afterwards. As the result, as I
> mentioned above, we don't need to record about the database directory.
> 
> By the way the name XLogLogMiss.. is somewhat confusing. How about
> XLogReportMissingDir (named after report_invalid_page).

Agree with you.

Also your words remind me that we should skip the checking if the consistency 
point
is reached.

Here is a git diff against the previous patch. I’ll send out the new rebased 
patches after
the consensus is reached.

diff --git a/src/backend/access/transam/xlogutils.c 
b/src/backend/access/transam/xlogutils.c
index 7ade385965..c8fe3fe228 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -90,7 +90,7 @@ typedef struct xl_missing_dir
 static HTAB *missing_dir_tab = NULL;

 void
-XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path)
+XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path)
 {
        xl_missing_dir_key key;
        bool found;
@@ -103,16 +103,6 @@ XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path)
         */
        Assert(OidIsValid(spcNode));

-       if (reachedConsistency)
-       {
-               if (dbNode == InvalidOid)
-                       elog(PANIC, "cannot find directory %s (tablespace %d)",
-                                path, spcNode);
-               else
-                       elog(PANIC, "cannot find directory %s (tablespace %d 
database %d)",
-                                path, spcNode, dbNode);
-       }
-
        if (missing_dir_tab == NULL)
        {
                /* create hash table when first needed */
diff --git a/src/backend/commands/dbcommands.c 
b/src/backend/commands/dbcommands.c
index fbff422c3b..7bd6d4efd9 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -2205,7 +2205,7 @@ dbase_redo(XLogReaderState *record)
                                                (errmsg("some useless files may 
be left behind in old database directory \"%s\"",
                                                                dst_path)));
                }
-               else
+               else if (!reachedConsistency)
                {
                        /*
                         * It is possible that drop tablespace record appearing 
later in
@@ -2221,7 +2221,7 @@ dbase_redo(XLogReaderState *record)
                        get_parent_directory(parent_path);
                        if (!(stat(parent_path, &st) == 0 && 
S_ISDIR(st.st_mode)))
                        {
-                               XLogLogMissingDir(xlrec->tablespace_id, 
InvalidOid, parent_path);
+                               XLogReportMissingDir(xlrec->tablespace_id, 
InvalidOid, parent_path);
                                skip = true;
                                ereport(WARNING,
                                                (errmsg("skipping create 
database WAL record"),
@@ -2239,9 +2239,10 @@ dbase_redo(XLogReaderState *record)
                 * noted above.  Moving a database from one tablespace may also 
be a
                 * partner in the crime.
                 */
-               if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
+               if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)) &&
+                       !reachedConsistency)
                {
-                       XLogLogMissingDir(xlrec->src_tablespace_id, 
xlrec->src_db_id, src_path);
+                       XLogReportMissingDir(xlrec->src_tablespace_id, 
xlrec->src_db_id, src_path);
                        skip = true;
                        ereport(WARNING,
                                        (errmsg("skipping create database WAL 
record"),
@@ -2311,7 +2312,8 @@ dbase_redo(XLogReaderState *record)
                                                (errmsg("some useless files may 
be left behind in old database directory \"%s\"",
                                                                dst_path)));

-                       XLogForgetMissingDir(xlrec->tablespace_ids[i], 
xlrec->db_id);
+                       if (!reachedConsistency)
+                               XLogForgetMissingDir(xlrec->tablespace_ids[i], 
xlrec->db_id);

                        pfree(dst_path);
                }
diff --git a/src/backend/commands/tablespace.c 
b/src/backend/commands/tablespace.c
index 294c9676b4..15eaa757cc 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1534,7 +1534,8 @@ tblspc_redo(XLogReaderState *record)
        {
                xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) 
XLogRecGetData(record);

-               XLogForgetMissingDir(xlrec->ts_id, InvalidOid);
+               if (!reachedConsistency)
+                       XLogForgetMissingDir(xlrec->ts_id, InvalidOid);

                XLogFlush(record->EndRecPtr);

diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index da561af5ab..6561d9cebe 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -23,7 +23,7 @@ extern void XLogDropDatabase(Oid dbid);
 extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
                                                                 BlockNumber 
nblocks);

-extern void XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path);
+extern void XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path);
 extern void XLogForgetMissingDir(Oid spcNode, Oid dbNode);
 extern void XLogCheckMissingDirs(void);

diff --git a/src/test/recovery/t/011_crash_recovery.pl 
b/src/test/recovery/t/011_crash_recovery.pl
index 748200ebb5..95eb6d26cc 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -141,7 +141,7 @@ $node_master->wait_for_catchup($node_standby, 'replay',
 $node_standby->safe_psql('postgres', 'CHECKPOINT');

 # Do immediate shutdown just after a sequence of CREAT DATABASE / DROP
-# DATABASE / DROP TABLESPACE. This causes CREATE DATBASE WAL records
+# DATABASE / DROP TABLESPACE. This causes CREATE DATABASE WAL records

Reply via email to