Hi,

I'd be curious if anybody wants to argue for keeping the clock sweep. Except
for the have_free_buffer() use in autoprewarm, it's a rather trivial
patch. And I really couldn't measure regressions above the noise level, even
if absurdly extreme use cases.


On 2025-07-17 14:35:13 -0400, Greg Burd wrote:
> On Fri, Jul 11, 2025, at 2:52 PM, Andres Freund wrote:
> > I think we'll likely need something to replace it.
>
> Fair, this (v5) patch doesn't yet try to address this.
>
> > TBH, I'm not convinced that autoprewarm using have_free_buffer() is quite
> > right. The goal of the use have_free_buffer() is obviously to stop
> > prewarming
> > shared buffers if doing so would just evict buffers.  But it's not clear
> > to me
> > that we should just stop when there aren't any free buffers - what if the
> > previous buffer contents aren't the right ones?  It'd make more sense to
> > me to
> > stop autoprewarm once NBuffers have been prewarmed...
>
> I had the same high level reaction, that autoprewarm was leveraging
> something
> convenient but not necessarily required or even correct.  I'd considered
> using
> NBuffers as you describe due to similar intuitions, I'll dig into that idea
> for
> the next revision after I get to know autoprewarm a bit better.

Cool. I do think that'll be good enough.



> > The most obvious way around this would be to make the clock hand a 64bit
> > atomic, which would avoid the need to have a separate tracker for the
> > number
> > of passes.  Unfortunately doing so would require doing a modulo
> > operation each
> > clock tick, which I think would likely be too expensive on common
> > platforms -
> > on small shared_buffers I actually see existing, relatively rarely
> > reached,
> > modulo in ClockSweepTick() show up on a Skylake-X system.
>
> So, this idea came back to me today as I tossed out the union branch and
> started
> over.
>
> a) can't require a power of 2 for NBuffers
> b) would like a power of 2 for NBuffers to make a few things more efficient
> c) a simple uint64 atomic counter would simplify things
>
> The attached (v5) patch takes this approach *and* avoids the modulo you were
> concerned with.  My approach is to have nextVictimBuffer as a uint64 that
> only
> increments (and at some point 200 years or so might wrap around, but I
> digress).
> To get the actual "victim" you modulo that, but not with "%" you call
> clock_modulo().  In that function I use a "next power of 2" value rather
> than
> NBuffers to efficiently find the modulo and adjust for the actual value.
> Same
> for completePasses which is now a function clock_passes() that does similar
> trickery and returns the number of times the counter (nextVictimBuffer) has
> "wrapped" around modulo NBuffers.

Yea, that could work!  It'd be interesting to see some performance numbers for
this...


> Now that both values exist in the same uint64 it can be the atomic vessel
> that coordinates them, no synchronization problems at all and no requirement
> for the buffer_strategy_lock.

Nice!



> > I think while at it, we should make ClockSweepTick() decrement
> > nextVictimBuffer by atomically subtracting NBuffers, rather than using
> > CAS. I
> > recently noticed that the CAS sometimes has to retry a fair number of
> > times,
> > which in turn makes the `victim % NBuffers` show up in profiles.
>
> In my (v5) patch there is one CAS that increments NBuffers.  All other
> operations on NBuffers are atomic reads.  The modulo you mention is gone
> entirely, unnecessary AFAICT.

There shouldn't be any CASes needed now, right? Just a fetch-add? The latter
often scales *way* better under contention.

[Looks at the patch ...]

Which I think is true in your patch, I don't see any CAS.



> Meanwhile, the tests except for Windows pass [2] for this new patch [3].
> I'll dig into the Windows issues next week as well.

FWIW, there are backtraces generated on windows. E.g.

https://api.cirrus-ci.com/v1/artifact/task/6327899394932736/crashlog/crashlog-postgres.exe_00c0_2025-07-17_19-19-00-008.txt

000000cd`827fdea0 00007ff7`6ad82f88     ucrtbased!abort(void)+0x5a 
[minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77]
000000cd`827fdee0 00007ff7`6aae2b7c     postgres!ExceptionalCondition(
                        char * conditionName = 0x00007ff7`6b2a4cb8 "result < 
NBuffers",
                        char * fileName = 0x00007ff7`6b2a4c88 
"../src/backend/storage/buffer/freelist.c",
                        int lineNumber = 0n139)+0x78 
[c:\cirrus\src\backend\utils\error\assert.c @ 67]
000000cd`827fdf20 00007ff7`6aae272c     postgres!clock_modulo(
                        unsigned int64 counter = 0x101)+0x6c 
[c:\cirrus\src\backend\storage\buffer\freelist.c @ 139]
000000cd`827fdf60 00007ff7`6aad8647     postgres!StrategySyncStart(
                        unsigned int * complete_passes = 0x000000cd`827fdfc0,
                        unsigned int * num_buf_alloc = 
0x000000cd`827fdfcc)+0x2c [c:\cirrus\src\backend\storage\buffer\freelist.c @ 
300]
000000cd`827fdfa0 00007ff7`6aa254a3     postgres!BgBufferSync(
                        struct WritebackContext * wb_context = 
0x000000cd`827fe180)+0x37 [c:\cirrus\src\backend\storage\buffer\bufmgr.c @ 3649]
000000cd`827fe030 00007ff7`6aa278a7     postgres!BackgroundWriterMain(
                        void * startup_data = 0x00000000`00000000,
                        unsigned int64 startup_data_len = 0)+0x243 
[c:\cirrus\src\backend\postmaster\bgwriter.c @ 236]
000000cd`827ff5a0 00007ff7`6a8daf19     postgres!SubPostmasterMain(
                        int argc = 0n3,
                        char ** argv = 0x0000028f`e75d24d0)+0x2f7 
[c:\cirrus\src\backend\postmaster\launch_backend.c @ 714]
000000cd`827ff620 00007ff7`6af0f5a9     postgres!main(
                        int argc = 0n3,
                        char ** argv = 0x0000028f`e75d24d0)+0x329 
[c:\cirrus\src\backend\main\main.c @ 222]

I.e. your new assertion failed for some reason that i can't *immediately* see.



> @@ -3664,15 +3664,25 @@ BgBufferSync(WritebackContext *wb_context)
>
>       /*
>        * Compute strategy_delta = how many buffers have been scanned by the
> -      * clock sweep since last time.  If first time through, assume none. 
> Then
> -      * see if we are still ahead of the clock sweep, and if so, how many
> +      * clock-sweep since last time.  If first time through, assume none. 
> Then
> +      * see if we are still ahead of the clock-sweep, and if so, how many
>        * buffers we could scan before we'd catch up with it and "lap" it. 
> Note:
>        * weird-looking coding of xxx_passes comparisons are to avoid bogus
>        * behavior when the passes counts wrap around.
>        */
>       if (saved_info_valid)
>       {
> -             int32           passes_delta = strategy_passes - 
> prev_strategy_passes;
> +             int32           passes_delta;
> +
> +             if (unlikely(prev_strategy_passes > strategy_passes))
> +             {
> +                     /* wrap-around case */
> +                     passes_delta = (int32) (UINT32_MAX - 
> prev_strategy_passes + strategy_passes);
> +             }
> +             else
> +             {
> +                     passes_delta = (int32) (strategy_passes - 
> prev_strategy_passes);
> +             }
>
>               strategy_delta = strategy_buf_id - prev_strategy_buf_id;
>               strategy_delta += (long) passes_delta * NBuffers;

That seems somewhat independent of the rest of the change, or am I missing 
something?


> +static uint32 NBuffersPow2;          /* NBuffers rounded up to the next 
> power of 2 */
> +static uint32 NBuffersPow2Shift;     /* Amount to bitshift NBuffers for
> +                                                                      * 
> division */

For performance in ClockSweepTick() it might more sense to store the mask
(i.e. NBuffersPow2 - 1), rather than the actual power of two.

Greetings,

Andres Freund


Reply via email to