Re: [HACKERS] Index AM change proposals, redux

Heikki Linnakangas Wed, 09 Apr 2008 22:38:40 -0700

Tom Lane wrote:

* Proposed change to let both amgetnext and amgetmulti mark returned
tuples as "candidate matches", that is in need of rechecking of quals
against the real heap tuple.  I had originally objected to this on
the grounds that it would require setup work that doesn't happen now,
but looking at the code I see that's wrong --- the setup work *does*
happen anyway, see indexqualorig.  nodeIndexscan.c comments "The
indexqualorig expression is always initialized even though it will only
be used in some uncommon cases --- would be nice to improve that".
So this complaint is probably a red herring.  I'm still a bit concerned
by the fact that the patch only tracks this on a page basis in the
amgetmulti case: if any of the tuples on a page are in-doubt then
everything on the page will have to be rechecked.  Is that good enough?

I think it's good enough. If we wanted to track it on a per-tuple basis,we'd need to store two bits per tuple, AFAICS, doubling the size of thebitmaps.

A bigger issue is whether this is worth applying when nobody seems to be
working on either of the main uses for it (bitmap indexes and GIT
indexes).  There seemed to be some possible marginal use for it in GIST
indexes, but I'm not convinced that's a sufficient reason to complicate
the APIs.

It has some merit on its own. Consider the case where you bitmap AND alossy page and a non-lossy one. We currently make the result lossy,because we can't know which tuples on the page match, and then we needto recheck all tuples on that page. With the support for candidatematches, we could instead keep the non-lossy version of the bitmap andmark the matches as candidates.

In the very best case, we might even apply a further AND to the page,which eliminates all the remaining candidate matches, and skip the heapaccess of that page altogether.

* Proposed change to let amgetnext mark returned tuples as not
necessarily in order, requiring somebody downstream to sort them again.
I was pretty desperately unhappy with that because it required injection
of sorting knowledge into code that really shouldn't have anything to do
with it --- not least because not all indexscans even have a defined
ordering.  Given that it is only needed for one particular possible
implementation of GIT, and Heikki was leaning away from that
implementation anyway at last report, I vote against considering this
any further.


Agreed, there's no use for that without GIT.

* Proposed changes to refactor the TIDBitmap support to include a
concept of a "stream bitmap" being read from disk.  (Not strictly an
index AM change, but closely related.)  While this is clean and nice on
its own, it doesn't seem to have any use unless bitmap indexes happen.
If we leave the code sit it'll probably acquire bit rot, but I'm
disinclined to add a bunch of unused code, too.  Thoughts?

If the code is unused, it can easily bit rot even if it's in CVS. Let'snot apply it. The patch is out there if/when someone picks up the bitmapindex patch again.

* Proposed changes to allow amgetnext to return the actual index keys,
allowing certain types of "unindexable" quals to be checked without
having to fetch the heap entry.  For example a LIKE condition could be
fully checked against the index entry.  Since certain types of indexes
(GIN now, possibly hash in future) are incapable of doing this, there'd
need to be a way of marking index AMs (or perhaps individual indexes) as
able or not able to return their keys, so that the planner could know
whether quals could be pushed into the indexscan.

We don't have any actual submitted patch for this, but AFAIR everyone
agrees it's a good idea.


Agreed :-).

* GIT (Grouped Index Tuple) indexes, which achieve index space savings
in btrees by having a single index tuple represent multiple heap tuples
(on a single heap page) containing a range of key values.  I am not sure
what the development status is --- Heikki had submitted a completed
patch but there seemed to be agreement on making changes, and that's not
been done AFAIK.  The really serious problem I've got with it is that
it'd foreclose the possibility of returning actual index keys from btree
indexes, thus basically killing the usefulness of that idea.  I'm not
convinced it would offer enough gain to be worth paying that price.
Another issue is that we'd need to check how much of the use-case for
GIT has been taken over by HOT.

I don't think there's much overlap with HOT. On the contrary, HOT makesGIT much more useful, because GIT was very sensitive to the clusterorder of the heap, and HOT helps with that.

There is, however, a ton of overlap with index-only scans, and thepossibility to return keys from indexes, as you pointed out.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index AM change proposals, redux

Reply via email to