On Thu, Feb 23, 2012 at 3:34 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas <robertmh...@gmail.com> wrote: >> On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs <si...@2ndquadrant.com> wrote: >>> The industry accepted description for non-sequential access is "random >>> access" whether or not the function that describes the movement is >>> entirely random. To argue otherwise is merely hairsplitting. >> >> I don't think so. > > PostgreSQL already uses a parameter called "random_page_cost" to > describe non-sequential access. Perhaps that is wrong and we need a > third parameter? > >> For example, a bitmap index scan contrives to speed >> things up by arranging for the table I/O to happen in ascending block >> number order, with skips, rather than in random order, as a plain >> index scan would do, and that seems to be a pretty effective >> technique. Except to the extent that it interferes with the kernel's >> ability to do readahead, it really can't be to read blocks 1, 2, 3, 4, >> and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't >> require more effort than reading it. > > By that argument, ANALYZE never could run longer than VACUUM ANALYZE, > so you disagree with Tom and I and you can't explain Pavel's > results.... > > cost_bitmap_heap_scan() uses "random_page_cost" to evaluate the cost > of accessing blocks, even though the author knew the access was in > ascending block number order. Why was that? > > Note that the cost_bitmap_heap_scan() cost can be > than > cost-seqscan() for certain parameter values.
I think all three of us are saying more or less the same thing in slightly different words, so I'd rather not have an argument about this one. But you're right: I can't explain Pavel's results, unless doing ANALYZE before VACUUM is causing skip-block reads that defeat the kernel's read-ahead detection. I think it's fairly self-evident that reading a fixed-size subset of the pages in ascending order can't *in general* be more expensive than reading all of an arbitrarily large table, and so I believe we're all in agreement that the behavior he observed is unusual. As to the cost estimation stuff, we use random_page_cost as an approximation: there may be a head seek involved, but to do better we'd have to estimate the likely length of the seek based on the number of blocks skipped, something we currently view as irrelevant, and it's not clear that it would improve the quality of the estimate very much - there are other, probably larger sources of error, such as the fact that the sequential logical block number doesn't imply sequential physical position on the platter, since the OS often fragments the file, especially (I think) on Windows. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers