> > Short summary: > > * It looks to me like the planner vastly overestimates > the # of pages read by index scan in quite a few of my > tables even though stats collected by ANALYZE are correct. > > * The problem happens any time you have multiple columns > that have a number of repeated values in them, and > you CLUSTER the table by a sort using both columns > (like "city,state,zip,phone#" or "firstname,lastname"). > > * I think this is the problem that Mark Kirkwood is seeing > in his threads Query optimizer 8.0.1 and "One Big trend > vs multiple smaller trends" in hackers.
actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you. :) (I hate using my name on lists like this because of spammers) > > * A test script demonstrating the issue also follows. > > * I think keeping one more stat per attribute in > pg_stastic that could describe this behavior. > > > Longer: > > > If I understand the optimizer correctly, correlation is used > to both guess how much random disk access will be required in > a query; as well as estimate how many pages will be read. > > Unfortunately, many tables in my larger databases have > columns with values that are tightly packed on a few pages; > even though there is no total-ordering across the whole table. > Stephan Szabo described this as a "clumping effect": > http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php Yes. I think we are describing the exact same issue. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq