On Tuesday, October 02, 2012 4:21 PM Heikki Linnakangas wrote: > Thanks for the thorough review! I committed the xlog.c refactoring patch > now. Attached is a new version of the main patch, comments on specific > points below. I didn't adjust the docs per your comments yet, will do > that next.
I have some doubts regarding the comments fixed by you and some more new review comments. After this I shall focus majorly towards testing of this Patch. > On 01.10.2012 05:25, Amit kapila wrote: > > 1. In function readTimeLineHistory(), > > two mechanisms are used to fetch timeline from history file > > + sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi, > > &switchpoint_lo); > > + > > > 8. In function writeTimeLineHistoryFile(), will it not be better to > > directly write rather than to later do pg_fsync(). > > as it is just one time write. > > Not sure I understood this right, but writeTimeLineHistoryFile() needs > to avoid putting a corrupt, e.g incomplete, file in pg_xlog. The same as > writeTimeLineHistory(). That's why the write+fsync+rename dance is > needed. > Why fsync, isn't the above purpose be resolved if write directly writes to file and then rename. > > 21. @@ -2411,27 +2411,6 @@ reaper(SIGNAL_ARGS) > > > > a. won't it impact stop of online basebackup functionality > because earlier on promotion > > I think this code will stop walsenders and basebackup stop > will also give error in such cases. > > Hmm, should a base backup be aborted when the standby is promoted? Does > the promotion render the backup corrupt? I think currently it does so. Pls refer 1. do_pg_stop_backup(char *labelfile, bool waitforarchive) { .. if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("the standby was promoted during online backup"), errhint("This means that the backup being taken is corrupt " "and should not be used. " "Try taking another online backup."))); .. } 2. In documentation of pg_basebackup there is a Note: .If the standby is promoted to the master during online backup, the backup fails. New Ones --------------- 35.WalSenderMain(void) { .. + if (walsender_shutdown_requested) + ereport(FATAL, + (errcode(ERRCODE_ADMIN_SHUTDOWN), + errmsg("terminating replication connection due to administrator command"))); + + /* Tell the client that we are ready to receive commands */ + ReadyForQuery(DestRemote); + .. + if (walsender_shutdown_requested) + ereport(FATAL, + (errcode(ERRCODE_ADMIN_SHUTDOWN), + errmsg("terminating replication connection due to administrator command"))); + is it necessary to check walsender_shutdown_requested 2 times in a loop, if yes, then can we write comment why it is important to check it again. 35. +SendTimeLineHistory(TimeLineHistoryCmd *cmd) { .. + fd = PathNameOpenFile(path, O_RDONLY | PG_BINARY, 0666); error handling for fd < 0 is missing. 36.+SendTimeLineHistory(TimeLineHistoryCmd *cmd) { .. if (nread <= 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not read file \"%s\": %m", + path))); FileClose should be done in error case as well. 37. static void XLogSend(char *msgbuf, bool *caughtup) { .. if (currentTimeLineIsHistoric && XLByteLE(currentTimeLineValidUpto, sentPtr)) { /* * This was a historic timeline, and we've reached the point where * we switched to the next timeline. Let the client know we've * reached the end of this timeline, and what the next timeline is. */ /* close the current file. */ if (sendFile >= 0) close(sendFile); sendFile = -1; *caughtup = true; /* Send CopyDone */ pq_putmessage_noblock('c', NULL, 0); streamingDoneSending = true; return; } } I am not able to understand after sending CopyDone message from above code, how walreceiver is handling it and then replying it a CopyDone message. Basically I want to know the walreceiver code which handles it? 38. static void WalSndLoop(void) { @@ -756,18 +898,24 @@ WalSndLoop(void) /* Normal exit from the walsender is here */ if (walsender_shutdown_requested) - { - /* Inform the standby that XLOG streaming is done */ - pq_puttextmessage('C', "COPY 0"); - pq_flush(); - - proc_exit(0); - } + ereport(FATAL, + (errcode(ERRCODE_ADMIN_SHUTDOWN), + errmsg("terminating replication connection due to administrator command"))); What is the reason of removal of sending above message to standby when shutdown was requested? 39. WalSndLoop(void) { .. /* If nothing remains to be sent right now ... */ if (caughtup && !pq_is_send_pending()) { /* * If we're in catchup state, move to streaming. This is an * important state change for users to know about, since before * this point data loss might occur if the primary dies and we * need to failover to the standby. The state change is also * important for synchronous replication, since commits that * started to wait at that point might wait for some time. */ if (MyWalSnd->state == WALSNDSTATE_CATCHUP) { ereport(DEBUG1, (errmsg("standby \"%s\" has now caught up with primary", application_name))); WalSndSetState(WALSNDSTATE_STREAMING); } .. } After new implementation, I think above if loop [if (caughtup && !pq_is_send_pending())] can be true when the standby has not actually caught up as now it sends the data from previous timelines. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers