On 28/03/2026 06:18, Ashutosh Bapat wrote:
Parallely myself and Palak Chaturvedi developed a quick patch to
modernise pg_buffercache_pages() and use tuplestore so that it doesn't
have to rely on NBuffers being the same between start of the scan,
when memory allocated, when the scan ends - a condition possible with
resizing buffer cache. It seems to improve the timings by about 10-30%
on my laptop for 128MB buffercache size. Without this patch the time
taken to execute Lukas's query varies between 10-15ms on my laptop.
With this patch it varies between 8-9ms. So the timing is more stable
as a side effect. It's not a 10x improvement that we are looking for
but it looks like a step in the right direction. That improvement
seems to come purely because we avoid creating a heap tuple. I wonder
if there are some places up in the execution tree where full
heaptuples get formed again instead of continuing to use minimal
tuples or places where we perform some extra actions that are not
required.
I didn't dig into the history to find out why we didn't modernize
pg_buffercache_pages(). I don't see any hazard though.
Committed this modernization patch, thanks!
It would be nice to have a proper row-at-a-time mode that would avoid
materializing the result, but collecting all the data in a temporary
array is clearly worse than just putting them to the tuplestore
directly. The only reason I can think of why we'd prefer to use a
temporary array like that is to get a more consistent snapshot of all
the buffers, by keeping the time spent scanning the buffers as short as
possible. But we're not getting a consistent view anyway, it's just a
matter of degree.
I wondered about this in pg_buffercache_pages.c:
/*
* To smoothly support upgrades from version 1.0 of this extension
* transparently handle the (non-)existence of the pinning_backends
* column. We unfortunately have to get the result type for that... - we
* can't use the result type determined by the function definition
without
* potentially crashing when somebody uses the old (or even wrong)
* function definition though.
*/
if (get_call_result_type(fcinfo, NULL, &expected_tupledesc) !=
TYPEFUNC_COMPOSITE)
elog(ERROR, "return type must be a row type");
if (expected_tupledesc->natts < NUM_BUFFERCACHE_PAGES_MIN_ELEM ||
expected_tupledesc->natts > NUM_BUFFERCACHE_PAGES_ELEM)
elog(ERROR, "incorrect number of output arguments");
I guess it's still needed, if you have pg_upgraded all the way from 1.0.
To test that, I created this view to match the old 1.0 definition:
CREATE VIEW public.legacy_pg_buffercache AS
SELECT bufferid,
relfilenode,
reltablespace,
reldatabase,
relforknumber,
relblocknumber,
isdirty,
usagecount
FROM public.pg_buffercache_pages() p(bufferid integer, relfilenode
oid, reltablespace oid, reldatabase oid, relforknumber smallint,
relblocknumber bigint, isdirty boolean, usagecount smallint);
"select * from public.legacy_pg_buffercache" still works, so all good.
- Heikki