[HACKERS] relation cache statistics (was: -HEAD planner issue wrt hash_joins on dbt3 ?)
On Sun, Sep 17, 2006 at 04:18:36PM -0400, Tom Lane wrote: * table and index. (Ideally other_pages should include all the other * tables and indexes used by the query too; but we don't have a good way * to get that number here.) A first-order approximation to this would be to add up the total sizes of all the other tables used in the query. I am thinking of leaving out other indexes, mainly because we can't tell at this level which other indexes are actually gonna get used. This would tend to underestimate by leaving out indexes, but not by a lot if you assume indexes are much smaller than their tables. It would also be an overestimate because tables that are not indexscanned concurrently with the one under consideration probably shouldn't be counted anyway. So one might hope these effects would more or less cancel out. Anyway it seems to be a better idea than what we have now. I think it'd be better to attack this problem from the other side; namely looking at what's actually cached. Sadly, I don't think there's any way to actually query the OS for info about what it has buffered, but we can look at what's in shared_buffers and assume that it's a reasonable proxy for the OS's cache. Something like... relBufPages / shared_buffers * effective_cache_size should give us a decent idea of what percentage of a relation will be in cache somewhere. (relBufPages is the number of pages the relation in question has in the buffer). Of course, that raises the question of how to track how many pages are in shared buffers for a relation. Given the criticality of locking there, we probably don't want to update that info in real-time, but for this application it's probably OK to just scan through the buffer every X period of time (maybe after X number of pages read into the buffers). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] relation cache statistics (was: -HEAD planner issue wrt hash_joins on dbt3 ?)
Jim C. Nasby [EMAIL PROTECTED] writes: I think it'd be better to attack this problem from the other side; namely looking at what's actually cached. You can kiss goodbye to plan stability if you go that route... and in any case I doubt the assumption that what's in shared buffers is representative of what's in kernel cache. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] relation cache statistics (was: -HEAD planner issue wrt hash_joins on dbt3 ?)
On Mon, Sep 18, 2006 at 12:20:10AM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: I think it'd be better to attack this problem from the other side; namely looking at what's actually cached. You can kiss goodbye to plan stability if you go that route... and in any case I doubt the assumption that what's in shared buffers is representative of what's in kernel cache. Well, there's 2 issues with caching: 1) Is something we're going to want actually in cache right now? 2) If we need to read something more than once (ie: higher level btree pages), what are the odds it will still be in cache when we come around to it the next time. Once the caches are warmed up, looking at what's actually in them would give you a very good probability for #1. I suspect that for large relations, shared_buffers would also match the OS cache pretty well in most cases, almost certainly better than whatever estimate we're using now. But I'm not sure how useful that info is to the planner. For #2 we'd have to know what kind of pressure the caches are under to replace pages and have some kind of idea how frequently the system is hitting them. The pg_statio info might be useful there, though unfortunately in that case I think there's much less likely to be a good correlation between the two. If there was *some* way to track stats on page fetches that came out of the OS cache, I suspect we could make great use of per-relation hit rates to come up with better plans. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings