Hi Kirill, Following up on my earlier note, I implemented the proposed HASH overflow page reuse enhancement.Recently freed overflow pages are recorded in _hash_freeovflpage( ),and _hash_addovflpage( ) now prefers reusing those pages during allocation before falling back to the bitmap scan.
The change is backend-local and allocation-only,with no WAL or on-disk format changes.I verified correctness using index build/drop and VACUUM cycles,and confirmed WAL neutrality by comparing WAL generated during HASH index builds with and without this change (no observable difference beyond normal noise). The patch is attached for review.Feedback is welcome. Best regards, Lakshmi On Tue, Dec 23, 2025 at 5:23 PM lakshmi <[email protected]> wrote: > Hi Kirill, > > I tested your patch on the current master and confirmed the WAL reduction > during HASH index build. > > While testing, I noticed a possible small follow-up improvement in HASH > overflow handling. Currently, any free overflow page may be reused, which > can scatter overflow chains and hurt cache locality. Reusing recently freed > overflow pages first could help, without changing WAL behavior or on-disk > format. > > I would like to work on this as a follow-up enhancement and would welcome > any suggestions. > > Best regards, > Lakshmi > > On Tue, Dec 23, 2025 at 2:31 PM Kirill Reshke <[email protected]> > wrote: > >> There exists an optimization to index creation process, when we omit >> to write any WAL >> for index build. It is currently supported in B Tree, GIN, GiST, spg >> indexes. >> It works because we do not need to recover anything if index creation >> fails, because if was not used by any query. So, the index can be >> built on-disk, and then, just before making the index alive, we can >> simply log all pages to WAL. >> >> Hash index currently lacks this optimization. >> PFA implementation. >> >> During my testing, I checked the amount of WAL generated by index >> build before and after patch applied. My script was something like: >> >> select pg_current_wal_insert_lsn(); >> >> create index on t using hash (i); >> >> select pg_current_wal_insert_lsn(); >> >> select pg_lsn_wal_diff(lsn1, lsn2); >> >> Resulting numbers depend on index size, but I got 2.5-3.5 times less >> WAL with this patch and 8 times less WAL with this patch + >> wal_compression=on. >> Index creation time, however, did not change much... >> >> About implementation: >> These are many types of record that can be generated during index build. >> I know for sure these are possible (double-checked using pg_waldump): >> >> SPLIT_COMPLETE >> INSERT >> SPLIT_ALLOCATE_PAGE >> SPLIT_PAGE >> ADD_OVFL_PAGE >> SQUEEZE_PAGE >> INIT_META_PAGE >> INIT_BITMAP_PAGE >> >> >> Looks like SPLIT_COMPLETE and VACUUM_ONE_PAGE are never generated >> during index build. I'm not sure about MOVE_PAGE_CONTENTS. >> >> So, implementation is simply pass isbuild flag everywhere something is >> wal-logged. Looks like it is less invasive than alternatives. >> >> -- >> Best regards, >> Kirill Reshke >> >
From 34d4d99baa5377a4dabb8cf9c4bcd0e9bb2c6376 Mon Sep 17 00:00:00 2001 From: Lakshmi <[email protected]> Date: Mon, 5 Jan 2026 10:38:23 +0530 Subject: [PATCH] hash: reuse recently freed overflow pages --- src/backend/access/hash/hashovfl.c | 77 ++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c index 3277da19840..c66f8adc056 100644 --- a/src/backend/access/hash/hashovfl.c +++ b/src/backend/access/hash/hashovfl.c @@ -22,6 +22,63 @@ #include "access/xloginsert.h" #include "miscadmin.h" #include "utils/rel.h" +/*------------------------------------------------------------------------- + * Backend-local reuse cache for recently freed overflow pages. + * Best-effort hint only: no correctness or WAL dependency. + *------------------------------------------------------------------------- + */ +#define HASH_OVFL_REUSE_CACHE_SIZE 32 + +typedef struct HashOvflReuseCache +{ + BlockNumber blocks[HASH_OVFL_REUSE_CACHE_SIZE]; + int head; + int count; +} HashOvflReuseCache; + +static HashOvflReuseCache ovfl_reuse_cache; + +/* Record a freed overflow page */ +static void +hash_record_freed_ovflpage(BlockNumber blkno) +{ + ovfl_reuse_cache.blocks[ovfl_reuse_cache.head] = blkno; + ovfl_reuse_cache.head = + (ovfl_reuse_cache.head + 1) % HASH_OVFL_REUSE_CACHE_SIZE; + + if (ovfl_reuse_cache.count < HASH_OVFL_REUSE_CACHE_SIZE) + ovfl_reuse_cache.count++; +} + +/* Try to reuse a recently freed overflow page */ +static bool +hash_try_reuse_cached_ovflpage(Relation rel, Buffer metabuf, + BlockNumber *reuse_blkno) +{ + HashMetaPage metap = HashPageGetMeta(BufferGetPage(metabuf)); + int i; + + for (i = 0; i < ovfl_reuse_cache.count; i++) + { + int idx = + (ovfl_reuse_cache.head - 1 - i + HASH_OVFL_REUSE_CACHE_SIZE) + % HASH_OVFL_REUSE_CACHE_SIZE; + + BlockNumber blkno = ovfl_reuse_cache.blocks[idx]; + uint32 bitno = _hash_ovflblkno_to_bitno(metap, blkno); + + /* Bitmap is authoritative */ + if (_hash_isbitset(metap->hashm_mapp, bitno)) + { + _hash_clrbit(metap->hashm_mapp, bitno); + *reuse_blkno = blkno; + return true; + } + } + + return false; +} + static uint32 _hash_firstfreebit(uint32 map); @@ -132,6 +189,10 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin, boo uint32 i, j; bool page_found = false; + BlockNumber reuse_blkno; + + + /* * Write-lock the tail page. Here, we need to maintain locking order such @@ -186,6 +247,16 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin, boo _hash_checkpage(rel, metabuf, LH_META_PAGE); metap = HashPageGetMeta(BufferGetPage(metabuf)); + /* + * NEW: Try reuse cache before scanning bitmap + */ + if (hash_try_reuse_cached_ovflpage(rel, metabuf, &reuse_blkno)) + { + ovflbuf = _hash_getinitbuf(rel, reuse_blkno); + page_found = true; + goto found; + } + /* start search at hashm_firstfree */ orig_firstfree = metap->hashm_firstfree; first_page = orig_firstfree >> BMPG_SHIFT(metap); @@ -633,6 +704,12 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf, CLRBIT(freep, bitmapbit); MarkBufferDirty(mapbuf); + /* + * NEW: Remember this overflow page for reuse + */ + + hash_record_freed_ovflpage(ovflblkno); + /* if this is now the first free page, update hashm_firstfree */ if (ovflbitno < metap->hashm_firstfree) { -- 2.39.5
