Re: Fix calculations on WAL recycling.
On Mon, Jul 23, 2018 at 07:55:57PM +0900, Kyotaro HORIGUCHI wrote: > At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier > wrote in <20180723065916.gi2...@paquier.xyz> >> This is an open item for v11. > > Mmm. Thanks. I wrongly thought this was v10 item. Removed this > from the next CF. Thanks for updating the CF app. >> By doing so, you would notice that the oldest WAL segment does not get >> recycled after the checkpoint, while it should as the redo pointer is >> always checkpoint generated. What happens is that this oldest segment >> gets recycled every two checkpoints. > > (I'm not sure I'm getting it correctly..) I see the old segments > recycled. When I see pg_switch_wal() returns 0/12.., > pg_walfile_name is ...13 and segment files for 13 and 14 are > found in pg_wal directory. That is, seg 12 was recycled as seg > 14. log_checkpoint=on shows every checkpoint recycles 1 segment > in the case. With your patch applied I see one segment recycled after each checkpoint, which is correct. On HEAD, you would see no segments, followed by 2 segments recycled. But I also see sometimes one segment recycled. >> The calculation of _logSegNo in CreateRestartPoint is visibly incorrect, >> this still uses PriorRedoPtr so the bug is not fixed for standbys. The >> tweaks for ThisTimeLineID also need to be out of the loop where >> PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation >> should be kept. > > Agreed. While I reconsidered this, I noticed that the estimated > checkpoint distance is 0 when PriorRedoPtr is invalid. This means > that the first checkpoint/restartpoint tries to reduce WAL > segments to min_wal_size. However, it happens only at initdb time > and makes no substantial behavior change so I made the change > ignoring the difference. Yes, same analysis here. >> 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is >> not. > > Thank you for the comments and suggestions. After all, I did the > following things in the attached patch. > > - Reverted the change on timeline switching. (Removed the (3)) > - Fixed CreateRestartPoint to do the same thing with CreateCheckPoint. > - Both CreateRestart/CheckPoint now tries trimming of WAL > segments even for the first time. Thanks, pushed after some comment tweaks and fixing the variable names at the top of xlog.c for the static declarations. Perhaps we can do more refactoring here by moving all the segment calculation logic directly in RemoveOldXlogFiles, but that makes the end LSN calculation a bit blurry so I kept things as you proposed in version 3 of the patch. -- Michael signature.asc Description: PGP signature
Re: Fix calculations on WAL recycling.
At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier wrote in <20180723065916.gi2...@paquier.xyz> > On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote: > > I'll register this to the next commitfest. > > > > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyot...@lab.ntt.co.jp > > This is an open item for v11. Mmm. Thanks. I wrongly thought this was v10 item. Removed this from the next CF. Thank you for taking a look. > >> While considering this, I found a bug in 4b0d28de06, which > >> removed prior checkpoint from control file. It actually trims the > >> segments before the last checkpoint's redo segment but recycling > >> is still considered based on the *prevous* checkpoint. As the > >> result min_wal_size doesn't work as told. Specifically, setting > >> min/max_wal_size to 48MB and advance four or more segments then > >> two checkpoints leaves just one segment, which is less than > >> min_wal_size. > >> > >> The attached patch fixes that. One arguable point on this would > >> be the removal of the behavior when RemoveXLogFile(name, > >> InvalidXLogRecPtr, ..). > >> > >> The only place calling the function with the parameter is > >> timeline switching. Previously unconditionally 10 segments are > >> recycled after switchpoint but the reason for the behavior is we > >> didn't have the information on previous checkpoint at hand at the > >> time. But now we can use the timeline switch point as the > >> approximate of the last checkpoint's redo point and this allows > >> us to use min/max_wal_size properly at the time. > > I have been looking at that, and tested with this simple scenario: > create table aa (a int); > > Then just repeat the following SQLs: > insert into aa values (1); > select pg_switch_wal(); > checkpoint; > select pg_walfile_name(pg_current_wal_lsn()); > > By doing so, you would notice that the oldest WAL segment does not get > recycled after the checkpoint, while it should as the redo pointer is > always checkpoint generated. What happens is that this oldest segment > gets recycled every two checkpoints. (I'm not sure I'm getting it correctly..) I see the old segments recycled. When I see pg_switch_wal() returns 0/12.., pg_walfile_name is ...13 and segment files for 13 and 14 are found in pg_wal directory. That is, seg 12 was recycled as seg 14. log_checkpoint=on shows every checkpoint recycles 1 segment in the case. > Then I had a look at the proposed patch, with a couple of comments. > > - if (PriorRedoPtr == InvalidXLogRecPtr) > - recycleSegNo = endlogSegNo + 10; > - else > - recycleSegNo = XLOGfileslop(PriorRedoPtr); > + recycleSegNo = XLOGfileslop(RedoRecPtr); > I think that this is a new behavior, and should not be changed, see > point 3 below. > > In CreateCheckPoint(), the removal of past WAL segments is always going > to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling > should happen also when PriorRedoPtr is InvalidXLogRecPtr, no? Yes. The reason for the change was the change of RemoveNonParentXlogFiles that I made and it is irrelevant to this bug fix (and rather breaking as you pointed..). So, reverted it. > /* Trim from the last checkpoint, not the last - 1 */ > This comment could be adjusted, let's remove "not the last - 1" . Oops! Thanks. The comment has finally vanished and melded into another comment just above. | * Delete old log files not required by the last checkpoint and recycle | * them > The calculation of _logSegNo in CreateRestartPoint is visibly incorrect, > this still uses PriorRedoPtr so the bug is not fixed for standbys. The > tweaks for ThisTimeLineID also need to be out of the loop where > PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation > should be kept. Agreed. While I reconsidered this, I noticed that the estimated checkpoint distance is 0 when PriorRedoPtr is invalid. This means that the first checkpoint/restartpoint tries to reduce WAL segments to min_wal_size. However, it happens only at initdb time and makes no substantial behavior change so I made the change ignoring the difference. > Finally, in summary, this patch is doing actually three things: > 1) Rename a couple of variables which refer incorrectly to the prior > checkpoint so as they refer to the last checkpoint generated. > 2) Make sure that WAL recycling/removal happens based on the last > checkpoint generated, which is the one just generated when past WAL > segments are cleaned up as post-operation actions. > 3) Enforce the recycling point to be the switch point instead of > arbitrarily recycle 10 segments, which is what b2a5545b has introduced. > Recycling at exactly the switch point is wrong, as the redo LSN of the > previous checkpoint may not be at the same segment when a timeline has > switched, so you would finish with recycling segments which are still > needed if an instance needs be crash-recovered post-promotion without > a completed post-recovery
Re: Fix calculations on WAL recycling.
On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote: > I'll register this to the next commitfest. > > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyot...@lab.ntt.co.jp This is an open item for v11. >> While considering this, I found a bug in 4b0d28de06, which >> removed prior checkpoint from control file. It actually trims the >> segments before the last checkpoint's redo segment but recycling >> is still considered based on the *prevous* checkpoint. As the >> result min_wal_size doesn't work as told. Specifically, setting >> min/max_wal_size to 48MB and advance four or more segments then >> two checkpoints leaves just one segment, which is less than >> min_wal_size. >> >> The attached patch fixes that. One arguable point on this would >> be the removal of the behavior when RemoveXLogFile(name, >> InvalidXLogRecPtr, ..). >> >> The only place calling the function with the parameter is >> timeline switching. Previously unconditionally 10 segments are >> recycled after switchpoint but the reason for the behavior is we >> didn't have the information on previous checkpoint at hand at the >> time. But now we can use the timeline switch point as the >> approximate of the last checkpoint's redo point and this allows >> us to use min/max_wal_size properly at the time. I have been looking at that, and tested with this simple scenario: create table aa (a int); Then just repeat the following SQLs: insert into aa values (1); select pg_switch_wal(); checkpoint; select pg_walfile_name(pg_current_wal_lsn()); By doing so, you would notice that the oldest WAL segment does not get recycled after the checkpoint, while it should as the redo pointer is always checkpoint generated. What happens is that this oldest segment gets recycled every two checkpoints. Then I had a look at the proposed patch, with a couple of comments. - if (PriorRedoPtr == InvalidXLogRecPtr) - recycleSegNo = endlogSegNo + 10; - else - recycleSegNo = XLOGfileslop(PriorRedoPtr); + recycleSegNo = XLOGfileslop(RedoRecPtr); I think that this is a new behavior, and should not be changed, see point 3 below. In CreateCheckPoint(), the removal of past WAL segments is always going to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling should happen also when PriorRedoPtr is InvalidXLogRecPtr, no? /* Trim from the last checkpoint, not the last - 1 */ This comment could be adjusted, let's remove "not the last - 1" . The calculation of _logSegNo in CreateRestartPoint is visibly incorrect, this still uses PriorRedoPtr so the bug is not fixed for standbys. The tweaks for ThisTimeLineID also need to be out of the loop where PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation should be kept. Finally, in summary, this patch is doing actually three things: 1) Rename a couple of variables which refer incorrectly to the prior checkpoint so as they refer to the last checkpoint generated. 2) Make sure that WAL recycling/removal happens based on the last checkpoint generated, which is the one just generated when past WAL segments are cleaned up as post-operation actions. 3) Enforce the recycling point to be the switch point instead of arbitrarily recycle 10 segments, which is what b2a5545b has introduced. Recycling at exactly the switch point is wrong, as the redo LSN of the previous checkpoint may not be at the same segment when a timeline has switched, so you would finish with recycling segments which are still needed if an instance needs be crash-recovered post-promotion without a completed post-recovery checkpoint. In short this is dangerous. I'll let Heikki comment on that, but I think that you should fetch the last redo LSN instead, still you need to be worried about checkpoints running in parallel of the startup process, so I would stick with the current logic. 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is not. -- Michael signature.asc Description: PGP signature
Fix calculations on WAL recycling.
I'll register this to the next commitfest. https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyot...@lab.ntt.co.jp > While considering this, I found a bug in 4b0d28de06, which > removed prior checkpoint from control file. It actually trims the > segments before the last checkpoint's redo segment but recycling > is still considered based on the *prevous* checkpoint. As the > result min_wal_size doesn't work as told. Specifically, setting > min/max_wal_size to 48MB and advance four or more segments then > two checkpoints leaves just one segment, which is less than > min_wal_size. > > The attached patch fixes that. One arguable point on this would > be the removal of the behavior when RemoveXLogFile(name, > InvalidXLogRecPtr, ..). > > The only place calling the function with the parameter is > timeline switching. Previously unconditionally 10 segments are > recycled after switchpoint but the reason for the behavior is we > didn't have the information on previous checkpoint at hand at the > time. But now we can use the timeline switch point as the > approximate of the last checkpoint's redo point and this allows > us to use min/max_wal_size properly at the time. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From f2b1a0b6360263d4ddf725075daf4b56800e3e18 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 19 Jul 2018 12:13:56 +0900 Subject: [PATCH] Fix calculation base of WAL recycling The commit 4b0d28de06 removed the prior checkpoint and related things but that leaves WAL recycling based on the prior checkpoint. This makes max_wal_size and min_wal_size work incorrectly. This patch makes WAL recycling be based on the last checkpoint. --- src/backend/access/transam/xlog.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4049deb968..d7a61af8f1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2287,7 +2287,7 @@ assign_checkpoint_completion_target(double newval, void *extra) * XLOG segments? Returns the highest segment that should be preallocated. */ static XLogSegNo -XLOGfileslop(XLogRecPtr PriorRedoPtr) +XLOGfileslop(XLogRecPtr RedoRecPtr) { XLogSegNo minSegNo; XLogSegNo maxSegNo; @@ -2299,9 +2299,9 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr) * correspond to. Always recycle enough segments to meet the minimum, and * remove enough segments to stay below the maximum. */ - minSegNo = PriorRedoPtr / wal_segment_size + + minSegNo = RedoRecPtr / wal_segment_size + ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1; - maxSegNo = PriorRedoPtr / wal_segment_size + + maxSegNo = RedoRecPtr / wal_segment_size + ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1; /* @@ -2316,7 +2316,7 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr) /* add 10% for good measure. */ distance *= 1.10; - recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) / + recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) / wal_segment_size); if (recycleSegNo < minSegNo) @@ -3896,12 +3896,12 @@ RemoveTempXlogFiles(void) /* * Recycle or remove all log files older or equal to passed segno. * - * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the - * redo pointer of the previous checkpoint. These are used to determine + * endptr is current (or recent) end of xlog, and RedoRecPtr is the + * redo pointer of the last checkpoint. These are used to determine * whether we want to recycle rather than delete no-longer-wanted log files. */ static void -RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) +RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr) { DIR *xldir; struct dirent *xlde; @@ -3944,7 +3944,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) /* Update the last removed location in shared memory first */ UpdateLastRemovedPtr(xlde->d_name); -RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr); +RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr); } } } @@ -4006,9 +4006,11 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI) * remove it yet. It should be OK to remove it - files that are * not part of our timeline history are not required for recovery * - but seems safer to let them be archived and removed later. + * Here, switchpoint is a good approximate of RedoRecPtr for + * RemoveXlogFile since we have just done timeline switching. */ if (!XLogArchiveIsReady(xlde->d_name)) -RemoveXlogFile(xlde->d_name, InvalidXLogRecPtr, switchpoint); +RemoveXlogFile(xlde->d_name, switchpoint, switchpoint); } } @@ -4018,14 +4020,12 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI) /* * Recycle or remove a log file that's no longer