Looks like my previous reply was held for moderation (maybe due to my new email 
address).
I configured my pg account today using the new email address. I guess this 
email would be
held for moderation.

I’m now replying my previous reply email and attaching the new patch series.


On Jul 6, 2020, at 10:18 AM, Paul Guo 
<gu...@vmware.com<mailto:gu...@vmware.com>> wrote:

Thanks for the review. I’m now re-picking up the work. I modified the code 
following the comments.
Besides, I tweaked the test code a bit. There are several things I’m not 100% 
sure. Please see
my replies below.

On Jan 27, 2020, at 11:24 PM, Fujii Masao 
<masao.fu...@oss.nttdata.com<mailto:masao.fu...@oss.nttdata.com>> wrote:

On 2020/01/15 19:18, Paul Guo wrote:
I further fixed the last test failure (due to a small bug in the test, not in 
code). Attached are the new patch series. Let's see the CI pipeline result.

Thanks for updating the patches!

I started reading the 0003 patch.

The approach that the 0003 patch uses is not the perfect solution.
If the standby crashes after tblspc_redo() removes the directory and before
its subsequent COMMIT record is replayed, PANIC error would occur since
there can be some unresolved missing directory entries when we reach the
consistent state. The problem would very rarely happen, though...
Just idea; calling XLogFlush() to update the minimum recovery point just
before tblspc_redo() performs destroy_tablespace_directories() may be
safe and helpful for the problem?

Yes looks like an issue. My understanding is the below scenario.

XLogLogMissingDir()

XLogFlush() in redo (e.g. in a commit redo).  <- create a minimum recovery 
point (we call it LSN_A).

tblspc_redo()->XLogForgetMissingDir()
       <- If we panic immediately after we remove the directory in tblspc_redo()
       <- when we do replay during crash-recovery, we will check consistency at 
LSN_A and thus PANIC inXLogCheckMissingDirs()

commit

We should add a XLogFlush() in tblspc_redo(). This brings several other 
questions to my minds also.


1. Should we call XLogFlush() in dbase_redo()  for XLOG_DBASE_DROP also?
   It calls both XLogDropDatabase() and XLogForgetMissingDir, which seem to 
have this issue also?

2. xact_redo_abort() calls DropRelationFiles() also. Why do not we call 
XLogFlush() there?



- appendStringInfo(buf, "copy dir %u/%u to %u/%u",
-  xlrec->src_tablespace_id, xlrec->src_db_id,
-  xlrec->tablespace_id, xlrec->db_id);
+ dbpath1 = GetDatabasePath(xlrec->src_db_id,  xlrec->src_tablespace_id);
+ dbpath2 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
+ appendStringInfo(buf, "copy dir %s to %s", dbpath1, dbpath2);
+ pfree(dbpath2);
+ pfree(dbpath1);

If the patch is for the bug fix and would be back-ported, the above change
would lead to change pg_waldump's output for CREATE/DROP DATABASE between
minor versions. IMO it's better to avoid such change and separate the above
as a separate patch only for master.

I know we do not want wal format between minor releases, but does wal 
description string change
between minor releases affect users? Anyone I’ll extract this part into a 
separate patch in the series
since this change is actually independent of the other changes..


- appendStringInfo(buf, " %u/%u",
-  xlrec->tablespace_ids[i], xlrec->db_id);
+ {
+ dbpath1 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]);
+ appendStringInfo(buf,  "%s", dbpath1);
+ pfree(dbpath1);
+ }

Same as above.

BTW, the above "%s" should be " %s", i.e., a space character needs to be
appended to the head of "%s”.

OK


+ get_parent_directory(parent_path);
+ if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
+ {
+ XLogLogMissingDir(xlrec->tablespace_id, InvalidOid, dst_path);

The third argument of XLogLogMissingDir() should be parent_path instead of
dst_path?

The argument is for debug message printing so both should be fine, but 
admittedly we are
logging for the tablespace directory so parent_path might be better.


+ if (hash_search(missing_dir_tab, &key, HASH_REMOVE, NULL) == NULL)
+ elog(DEBUG2, "dir %s tablespace %d database %d is not missing",
+  path, spcNode, dbNode);

I think that this elog() is useless and rather confusing.

OK. Modified.


+ XLogForgetMissingDir(xlrec->ts_id, InvalidOid, "");

The third argument should be set to the actual path instead of an empty
string. Otherwise XLogForgetMissingDir() may emit a confusing DEBUG2
message. Or the third argument of XLogForgetMissingDir() should be removed
and the path in the DEBUG2 message should be calculated from the spcNode
and dbNode in the hash entry in XLogForgetMissingDir().

I’m now removing the third argument. Use GetDatabasePath() to get the path if 
database did I snot InvalidOid.


+#include "common/file_perm.h"

This seems not necessary.

Right.

Attachment: v10-0001-Support-node-initialization-from-backup-with-tab.patch
Description: v10-0001-Support-node-initialization-from-backup-with-tab.patch

Attachment: v10-0002-Tests-to-replay-create-database-operation-on-sta.patch
Description: v10-0002-Tests-to-replay-create-database-operation-on-sta.patch

Attachment: v10-0003-Fix-replay-of-create-database-records-on-standby.patch
Description: v10-0003-Fix-replay-of-create-database-records-on-standby.patch

Attachment: v10-0004-Fix-database-create-drop-wal-description.patch
Description: v10-0004-Fix-database-create-drop-wal-description.patch

Reply via email to