On Thu, 19 Jan 2006, Jim C. Nasby wrote: > > Feel free to do whatever with this, it's pretty fast for tables where > > seeks to validate tuples would hurt, but you do get back dead things... > > How'd you then weed out the dead tuples?
I didn't get that far with it. The purpose of this function was to quickly put something together to demonstrate that the overhead of seeking to the proper tuples in the heap to determine their visibility was the main component of the time being spent to satisfy our queries. > Basically, numbers talk. If there were convincing numbers for something > that wasn't a corner-case that showed a marked improvement then there'd > be much more interest in getting this into the backend in some fashion. I could get some numbers of how much time validating tuples adds to a query, but I don't think that that would be horribly novel. BTW, hopefully I did not make you think that I intended to get this into the official backend. This function was only meant to demonstrate to the people around here that the visibility check was the bottleneck we were seeing. The function may also be interesting as a demonstration of how indexes are handled in postgres, as you can see when tuples are flagged as no longer valid and when they are not. I have put xmin into an index so that I could use this function to better visualize when index tuples are left behind (I tried to put xmax in there too, but I never saw them change, after checking the code it turns out that the index is never told about changes in xmax). We were seeing this case: All rows in our table are visible (we are the only transaction on the machine and we did a VACUUM FULL ANALYZE before). We rebooted to ensure no caching. We were seeing times which, upon division by the number of rows returned by the index scan, were remarkably close to the average seek time listed on the specs for the hard drive in the testing box. This was about 5ms, which doesn't sound like much, but given a large enough number of rows and a few joins, 5ms per tuple adds up quickly. This implies that we were seeing approximately the worst case as far as the distribution of the relevant tuples on pages, ie each tuple we wanted was on a different heap page. Digging back to some times we had collected from this experiment, apparently we were taking about 15 to 20 seconds to run a particular query, and when we used the function I previously posted those times were reduced to 5 seconds. This was a while ago, however, so these times are probably not very accurate and we probably made other tweaks to speed things up since then. But it gives an idea. We could come up with more absolute numbers, but I think people already know what they would look like. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster