[HACKERS] TODO item: Allow data to be pulled directly from indexes

Karl Schnaitter Sun, 29 Jun 2008 08:49:34 -0700

Sometime last year, a discussion started about including visibilitymetadata to avoid heap fetches during an index scan:


http://archives.postgresql.org/pgsql-patches/2007-10/msg00166.php
http://archives.postgresql.org/pgsql-patches/2008-01/msg00049.php


I think the last discussion on this was in April:

http://archives.postgresql.org/pgsql-hackers/2008-04/msg00618.php (lastitem)

I have worked with the current patch, and I have some thoughts aboutthat approach and the approaches listed in the TODO item. The TODO liststhree approaches, in short

(1) Add a bit for an index tuple that indicates "visible" or "maybevisible."(2) Use a per-table bitmap that indicates which pages have at least onetuple that is not visible to all transactions.

(3) Same as (2) but at the granularity of one bit per table.

The approach in the patch is different:

(4) Add transaction ids, etc to the index tuple (totaling 16 bytes)

I would group (1) & (4) together and (2) & (3) together. I think thetime and space trade-offs are pretty obvious, so I won't waste time onthose.

(1) & (4) require an UPDATE or DELETE to twiddle the old index tuple.Tom has noted (in the linked message) that this is not reliable if theindex has any expression-valued columns, because it is not alwayspossible to find the old index entry. For this reason, the proposedpatch does not keep visibility metadata for indexes on expressions. Thisseems like a reasonable limitation --- indexed expressions are just lessefficient.

The main difference between (1) & (4) is that (1) will sometimes requireheap lookups and (4) never will. Moreover, the heap lookups in (1) willbe difficult for the optimizer to estimate, unless some specialstatistics can be maintained for this purpose.

I should mention there is a major flaw in the patch, because it putspointers to HOT tuples in the index, in order to capture the differenttransaction ids in the chain. I think this can be fixed by only pointingto the root of the HOT chain, and setting xmin/xmax to the entire rangeof transaction ids spanned by the chain. I'm not sure about all thedetails (the ctid and some other bits also need to be set).

(2) & (3) can work for any index, and they are quite elegant in the waythat the overhead does not change with the number of indexes. The TODOalso notes the benefit of (2) for efficient vacuuming. Thus, I thinkthat (2) is a great idea in general, but it does not serve the intendedpurpose of this TODO item. Once a page gets marked as requiringvisibility checks, it cannot be unmarked until the next VACUUM. Thewhole point of this feature is that we are willing to be more proactiveduring updates in order to make index access more efficient.

So in summary, I think that (2) would be nice as a separate feature,with (1) and (4) being more favorable for index-only scans. The obvioustrouble with (4) is the extra space overhead. There are also issues withcorrectness that I mentioned (any thoughts here would be appreciated).Other than that, I would favor (4) because it offers the most stableperformance.

Please let me know if you agree/disagree with anything here. I need toget this feature implemented for my research, but I would also love tocontribute it to the community so your opinions matter a lot.



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] TODO item: Allow data to be pulled directly from indexes

Reply via email to