On 9/14/21, 7:23 AM, "Dipesh Pandit" <dipesh.pan...@gmail.com> wrote:
> I agree that when we are creating a .ready file we should compare 
> the current .ready file with the last .ready file to check if this file is 
> created out of order. We can store the state of the last .ready file 
> in shared memory and compare it with the current .ready file. I
> believe that archiver specific shared memory area can be used
> to store the state of the last .ready file unless I am missing
> something and this needs to be stored in a separate shared
> memory area.
>
> With this change, we have the flexibility to move the current archiver
> state out of shared memory and keep it local to archiver. I have 
> incorporated these changes and updated a new patch.

I wonder if this can be simplified even further.  If we don't bother
trying to catch out-of-order .ready files in XLogArchiveNotify() and
just depend on the per-checkpoint/restartpoint directory scans, we can
probably remove lastReadySegNo from archiver state completely.

+       /* Force a directory scan if we are archiving anything but a regular
+        * WAL file or if this WAL file is being created out-of-order.
+        */
+       if (!IsXLogFileName(xlog))
+               PgArchForceDirScan();
+       else
+       {
+               TimeLineID tli;
+               XLogSegNo last_segno;
+               XLogSegNo this_segno;
+
+               last_segno = PgArchGetLastReadySegNo();
+               XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+
+               /*
+                * Force a directory scan in case if this .ready file created 
out of
+                * order.
+                */
+               last_segno++;
+               if (last_segno != this_segno)
+                       PgArchForceDirScan();
+
+               PgArchSetLastReadySegNo(this_segno);
+       }

This is an interesting idea, but the "else" block here seems prone to
race conditions.  I think we'd have to hold arch_lck to prevent that.
But as I mentioned above, if we are okay with depending on the
fallback directory scans, I think we can remove the "else" block
completely.

+       /* Initialize the current state of archiver */
+       xlogState.lastSegNo = MaxXLogSegNo;
+       xlogState.lastTli = MaxTimeLineID;

It looks like we have two ways to force a directory scan.  We can
either set force_dir_scan to true, or lastSegNo can be set to
MaxXLogSegNo.  Why not just set force_dir_scan to true here so that we
only have one way to force a directory scan?

+                                       /*
+                                        * Failed to archive, make sure that 
archiver performs a
+                                        * full directory scan in the next 
cycle to avoid missing
+                                        * the WAL file which could not be 
archived due to some
+                                        * failure in current cycle.
+                                        */
+                                       PgArchForceDirScan();

Don't we also need to force a directory scan in the other cases we
return early from pgarch_ArchiverCopyLoop()?  We will have already
advanced the archiver state in pgarch_readyXlog(), so I think we'd end
up skipping files if we didn't.  For example, if archive_command isn't
set, we'll just return, and the next call to pgarch_readyXlog() might
return the next file.

+                       /* Continue directory scan until we find a regular WAL 
file */
+                       SpinLockAcquire(&PgArch->arch_lck);
+                       PgArch->force_dir_scan = true;
+                       SpinLockRelease(&PgArch->arch_lck);

nitpick: I think we should just call PgArchForceDirScan() here.

Nathan

Reply via email to