Thank you Andres. If I want to do "a" ( Do one probe of the buffer mapping table for each block of the relation. Thus O(#relation blocks)) what function calls can I use, assuming I only have access to the relation id? How can I access and scan the buffer mapping table?
On Fri, Jan 13, 2023 at 6:27 PM Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2023-01-13 17:28:31 -0800, Amin wrote: > > Before scanning a relation, in the planner stage, I want to make a call > to > > retrieve information about how many pages will be a hit for a specific > > relation. The module pg_buffercache seems to be doing a similar thing. > > Also, pg_statio_all_tables seems to be having that information, but it is > > updated after execution. However, I want the information before > execution. > > Also not sure how pg_statio_all_tables is created and how I can access it > > in the code. > > There's no cheap way to do that. Currently the only ways are to: > > a) Do one probe of the buffer mapping table for each block of the > relation. Thus O(#relation blocks). > > b) Scan all of buffer headers, check which are for the relation. Thus > O(#NBuffers) > > Neither of which are a good idea during planning. > > > It might be a bit more realistic to get very rough estimates: > > You could compute the table's historic cache hit ratio from pgstats (i.e. > use > the data backing pg_statio_all_tables). Of course that's not going to be > specific to your query (for index scans etc), and might have changed more > recently. It'd also be completely wrong after a restart. > > If we had information about *recent* cache hit patterns for the relation, > it'd > be a lot better, but we don't have the infrastructure for that, and > introducing it would increase the size of the stats entries noticably. > > Another way could be to probe the buffer mapping table for a small subset > of > the locks and infer the likelihood of other blocks being in shared buffers > that way. > > A third way could be to track the cache hit for relations in backend local > memory, likely in the relache entry. The big disadvantage would be that > query > plans would differ between connections and that connections would need to > "warm up" to have good plans. But it'd handle restarts nicely. > > Greetings, > > Andres Freund >