Hi, I'd be curious if anybody wants to argue for keeping the clock sweep. Except for the have_free_buffer() use in autoprewarm, it's a rather trivial patch. And I really couldn't measure regressions above the noise level, even if absurdly extreme use cases.
On 2025-07-17 14:35:13 -0400, Greg Burd wrote: > On Fri, Jul 11, 2025, at 2:52 PM, Andres Freund wrote: > > I think we'll likely need something to replace it. > > Fair, this (v5) patch doesn't yet try to address this. > > > TBH, I'm not convinced that autoprewarm using have_free_buffer() is quite > > right. The goal of the use have_free_buffer() is obviously to stop > > prewarming > > shared buffers if doing so would just evict buffers. But it's not clear > > to me > > that we should just stop when there aren't any free buffers - what if the > > previous buffer contents aren't the right ones? It'd make more sense to > > me to > > stop autoprewarm once NBuffers have been prewarmed... > > I had the same high level reaction, that autoprewarm was leveraging > something > convenient but not necessarily required or even correct. I'd considered > using > NBuffers as you describe due to similar intuitions, I'll dig into that idea > for > the next revision after I get to know autoprewarm a bit better. Cool. I do think that'll be good enough. > > The most obvious way around this would be to make the clock hand a 64bit > > atomic, which would avoid the need to have a separate tracker for the > > number > > of passes. Unfortunately doing so would require doing a modulo > > operation each > > clock tick, which I think would likely be too expensive on common > > platforms - > > on small shared_buffers I actually see existing, relatively rarely > > reached, > > modulo in ClockSweepTick() show up on a Skylake-X system. > > So, this idea came back to me today as I tossed out the union branch and > started > over. > > a) can't require a power of 2 for NBuffers > b) would like a power of 2 for NBuffers to make a few things more efficient > c) a simple uint64 atomic counter would simplify things > > The attached (v5) patch takes this approach *and* avoids the modulo you were > concerned with. My approach is to have nextVictimBuffer as a uint64 that > only > increments (and at some point 200 years or so might wrap around, but I > digress). > To get the actual "victim" you modulo that, but not with "%" you call > clock_modulo(). In that function I use a "next power of 2" value rather > than > NBuffers to efficiently find the modulo and adjust for the actual value. > Same > for completePasses which is now a function clock_passes() that does similar > trickery and returns the number of times the counter (nextVictimBuffer) has > "wrapped" around modulo NBuffers. Yea, that could work! It'd be interesting to see some performance numbers for this... > Now that both values exist in the same uint64 it can be the atomic vessel > that coordinates them, no synchronization problems at all and no requirement > for the buffer_strategy_lock. Nice! > > I think while at it, we should make ClockSweepTick() decrement > > nextVictimBuffer by atomically subtracting NBuffers, rather than using > > CAS. I > > recently noticed that the CAS sometimes has to retry a fair number of > > times, > > which in turn makes the `victim % NBuffers` show up in profiles. > > In my (v5) patch there is one CAS that increments NBuffers. All other > operations on NBuffers are atomic reads. The modulo you mention is gone > entirely, unnecessary AFAICT. There shouldn't be any CASes needed now, right? Just a fetch-add? The latter often scales *way* better under contention. [Looks at the patch ...] Which I think is true in your patch, I don't see any CAS. > Meanwhile, the tests except for Windows pass [2] for this new patch [3]. > I'll dig into the Windows issues next week as well. FWIW, there are backtraces generated on windows. E.g. https://api.cirrus-ci.com/v1/artifact/task/6327899394932736/crashlog/crashlog-postgres.exe_00c0_2025-07-17_19-19-00-008.txt 000000cd`827fdea0 00007ff7`6ad82f88 ucrtbased!abort(void)+0x5a [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77] 000000cd`827fdee0 00007ff7`6aae2b7c postgres!ExceptionalCondition( char * conditionName = 0x00007ff7`6b2a4cb8 "result < NBuffers", char * fileName = 0x00007ff7`6b2a4c88 "../src/backend/storage/buffer/freelist.c", int lineNumber = 0n139)+0x78 [c:\cirrus\src\backend\utils\error\assert.c @ 67] 000000cd`827fdf20 00007ff7`6aae272c postgres!clock_modulo( unsigned int64 counter = 0x101)+0x6c [c:\cirrus\src\backend\storage\buffer\freelist.c @ 139] 000000cd`827fdf60 00007ff7`6aad8647 postgres!StrategySyncStart( unsigned int * complete_passes = 0x000000cd`827fdfc0, unsigned int * num_buf_alloc = 0x000000cd`827fdfcc)+0x2c [c:\cirrus\src\backend\storage\buffer\freelist.c @ 300] 000000cd`827fdfa0 00007ff7`6aa254a3 postgres!BgBufferSync( struct WritebackContext * wb_context = 0x000000cd`827fe180)+0x37 [c:\cirrus\src\backend\storage\buffer\bufmgr.c @ 3649] 000000cd`827fe030 00007ff7`6aa278a7 postgres!BackgroundWriterMain( void * startup_data = 0x00000000`00000000, unsigned int64 startup_data_len = 0)+0x243 [c:\cirrus\src\backend\postmaster\bgwriter.c @ 236] 000000cd`827ff5a0 00007ff7`6a8daf19 postgres!SubPostmasterMain( int argc = 0n3, char ** argv = 0x0000028f`e75d24d0)+0x2f7 [c:\cirrus\src\backend\postmaster\launch_backend.c @ 714] 000000cd`827ff620 00007ff7`6af0f5a9 postgres!main( int argc = 0n3, char ** argv = 0x0000028f`e75d24d0)+0x329 [c:\cirrus\src\backend\main\main.c @ 222] I.e. your new assertion failed for some reason that i can't *immediately* see. > @@ -3664,15 +3664,25 @@ BgBufferSync(WritebackContext *wb_context) > > /* > * Compute strategy_delta = how many buffers have been scanned by the > - * clock sweep since last time. If first time through, assume none. > Then > - * see if we are still ahead of the clock sweep, and if so, how many > + * clock-sweep since last time. If first time through, assume none. > Then > + * see if we are still ahead of the clock-sweep, and if so, how many > * buffers we could scan before we'd catch up with it and "lap" it. > Note: > * weird-looking coding of xxx_passes comparisons are to avoid bogus > * behavior when the passes counts wrap around. > */ > if (saved_info_valid) > { > - int32 passes_delta = strategy_passes - > prev_strategy_passes; > + int32 passes_delta; > + > + if (unlikely(prev_strategy_passes > strategy_passes)) > + { > + /* wrap-around case */ > + passes_delta = (int32) (UINT32_MAX - > prev_strategy_passes + strategy_passes); > + } > + else > + { > + passes_delta = (int32) (strategy_passes - > prev_strategy_passes); > + } > > strategy_delta = strategy_buf_id - prev_strategy_buf_id; > strategy_delta += (long) passes_delta * NBuffers; That seems somewhat independent of the rest of the change, or am I missing something? > +static uint32 NBuffersPow2; /* NBuffers rounded up to the next > power of 2 */ > +static uint32 NBuffersPow2Shift; /* Amount to bitshift NBuffers for > + * > division */ For performance in ClockSweepTick() it might more sense to store the mask (i.e. NBuffersPow2 - 1), rather than the actual power of two. Greetings, Andres Freund