Re: [HACKERS] Avoiding redundant fetches of btree index metapages
On Tue, 2006-04-25 at 13:43 -0400, Tom Lane wrote: What I'm considering doing to fix that is require any change to a btree's metapage data to send out a relcache inval message for the index. That will force all backends to flush the stale cache item not later than the start of their next transaction, and thereby guarantee that they aren't using pointers that are too old to be safe against vacuuming. (There are other ways we could handle this, but that one seems like the simplest and least invasive.) Comments? Anyone see any flaws in the reasoning? Hmmm I'm slightly worried that frequently-churning small tables will be made even slower by this. What do you think? * Re-using rd_targblock was OK for a quick hack because we don't use it for indexes, but it's too klugy for real use, and it's not quite enough info anyway (we really ought to cache the root level as well as root block number). I'm planning to add a void * pointer to the Relation struct that the index AM is allowed to use as it sees fit, with the understanding that any pointed-to data lives in rd_indexcxt and the pointer will be reset to NULL on any relcache clear. btree would store a copy of the BTMetaPageData struct. The other AMs might have some other use for this. So we would be able to cache other items also? I'd also considered caching the rightmost page, for when the table grows via a monotonically increasing key. Similar benefits, same problems, so same-ish solution sounds the right way. For that case we'd save N block accesses to locate the rightmost leaf page. If the cache is wrong, we could seek again from the root at a cost of 1 additional access (or move right until we find it depending upon how stale we think the cache is, if we can find a heuristic to gauge that). We would only need to send an invalidation message when VACUUM removes a page, as well as for any insertion other than the rightmost page. Maybe do this as an index option, e.g. APPEND (MONOTONIC seems like a poor choice, even if it would be more correct)? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Avoiding redundant fetches of btree index metapages
Simon Riggs [EMAIL PROTECTED] writes: Hmmm I'm slightly worried that frequently-churning small tables will be made even slower by this. What do you think? How so? So we would be able to cache other items also? Only to the extent that you can guarantee a stale cache entry isn't a problem. We've already done the analysis involved for the existing metapage entries, but anything else would require more thought. (And more cache flush events.) For that case we'd save N block accesses to locate the rightmost leaf page. Surely you mean log(N). regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Avoiding redundant fetches of btree index metapages
On Wed, 2006-04-26 at 12:53 -0400, Tom Lane wrote: So we would be able to cache other items also? Only to the extent that you can guarantee a stale cache entry isn't a problem. We've already done the analysis involved for the existing metapage entries, but anything else would require more thought. (And more cache flush events.) You mean performance tests! Will do. Methinks that cache flushing is the key to performance for that idea. For that case we'd save N block accesses to locate the rightmost leaf page. Surely you mean log(N). Depends what N is. I meant the level, not the number of rows. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Avoiding redundant fetches of btree index metapages
Tom Lane wrote: Some experimentation over the weekend turned up the result that $SUBJECT is a good idea. I had never thought about this much before, figuring that in a heavily-used index the metapage would stay in cache anyway and so fetching it at the start of each index search isn't costing any extra I/O. That's true, but what it does cost is bufmgr contention, and in fact more than you'd expect because the amount of work actually done before releasing the page again is minuscule. (See off-list discussion attached below.) Wow, this is extremely nice. Congratulations on another well-spotted performance problem solved. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings