Re: [HACKERS] No heap lookups on index

Jeremy Drake Thu, 19 Jan 2006 17:56:39 -0800

On Thu, 19 Jan 2006, Jim C. Nasby wrote:

> > Feel free to do whatever with this, it's pretty fast for tables where
> > seeks to validate tuples would hurt, but you do get back dead things...
>
> How'd you then weed out the dead tuples?


I didn't get that far with it.  The purpose of this function was to
quickly put something together to demonstrate that the overhead of seeking
to the proper tuples in the heap to determine their visibility was the
main component of the time being spent to satisfy our queries.

> Basically, numbers talk. If there were convincing numbers for something
> that wasn't a corner-case that showed a marked improvement then there'd
> be much more interest in getting this into the backend in some fashion.

I could get some numbers of how much time validating tuples adds to a
query, but I don't think that that would be horribly novel.  BTW,
hopefully I did not make you think that I intended to get this into
the official backend.  This function was only meant to demonstrate to the
people around here that the visibility check was the bottleneck we were
seeing.  The function may also be interesting as a demonstration of how
indexes are handled in postgres, as you can see when tuples are flagged as
no longer valid and when they are not.  I have put xmin into an index so
that I could use this function to better visualize when index tuples are
left behind (I tried to put xmax in there too, but I never saw them
change, after checking the code it turns out that the index is never told
about changes in xmax).


We were seeing this case: All rows in our table are visible (we are the
only transaction on the machine and we did a VACUUM FULL ANALYZE before).
We rebooted to ensure no caching.  We were seeing times which, upon
division by the number of rows returned by the index scan, were remarkably
close to the average seek time listed on the specs for the hard drive in
the testing box.  This was about 5ms, which doesn't sound like much, but
given a large enough number of rows and a few joins, 5ms per tuple adds up
quickly.  This implies that we were seeing approximately the worst case as
far as the distribution of the relevant tuples on pages, ie each tuple we
wanted was on a different heap page.

Digging back to some times we had collected from this experiment,
apparently we were taking about 15 to 20 seconds to run a particular
query, and when we used the function I previously posted those times were
reduced to 5 seconds.  This was a while ago, however, so these times are
probably not very accurate and we probably made other tweaks to speed
things up since then.  But it gives an idea.  We could come up with more
absolute numbers, but I think people already know what they would look
like.


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] No heap lookups on index

Reply via email to