Thank you Andres.

If I want to do "a" ( Do one probe of the buffer mapping table for each
block of the relation. Thus O(#relation blocks)) what function calls can I
use, assuming I only have access to the relation id? How can I access and
scan the buffer mapping table?

On Fri, Jan 13, 2023 at 6:27 PM Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2023-01-13 17:28:31 -0800, Amin wrote:
> > Before scanning a relation, in the planner stage, I want to make a call
> to
> > retrieve information about how many pages will be a hit for a specific
> > relation. The module pg_buffercache seems to be doing a similar thing.
> > Also, pg_statio_all_tables seems to be having that information, but it is
> > updated after execution. However, I want the information before
> execution.
> > Also not sure how pg_statio_all_tables is created and how I can access it
> > in the code.
>
> There's no cheap way to do that. Currently the only ways are to:
>
> a) Do one probe of the buffer mapping table for each block of the
>    relation. Thus O(#relation blocks).
>
> b) Scan all of buffer headers, check which are for the relation. Thus
>    O(#NBuffers)
>
> Neither of which are a good idea during planning.
>
>
> It might be a bit more realistic to get very rough estimates:
>
> You could compute the table's historic cache hit ratio from pgstats (i.e.
> use
> the data backing pg_statio_all_tables). Of course that's not going to be
> specific to your query (for index scans etc), and might have changed more
> recently. It'd also be completely wrong after a restart.
>
> If we had information about *recent* cache hit patterns for the relation,
> it'd
> be a lot better, but we don't have the infrastructure for that, and
> introducing it would increase the size of the stats entries noticably.
>
> Another way could be to probe the buffer mapping table for a small subset
> of
> the locks and infer the likelihood of other blocks being in shared buffers
> that way.
>
> A third way could be to track the cache hit for relations in backend local
> memory, likely in the relache entry. The big disadvantage would be that
> query
> plans would differ between connections and that connections would need to
> "warm up" to have good plans. But it'd handle restarts nicely.
>
> Greetings,
>
> Andres Freund
>

Reply via email to