>[EMAIL PROTECTED]@[EMAIL PROTECTED] wrote > > Short summary: > > > > * It looks to me like the planner vastly overestimates > > the # of pages read by index scan in quite a few of my > > tables even though stats collected by ANALYZE are correct. > > > > * The problem happens any time you have multiple columns > > that have a number of repeated values in them, and > > you CLUSTER the table by a sort using both columns > > (like "city,state,zip,phone#" or "firstname,lastname"). > > > > * I think this is the problem that Mark Kirkwood is seeing > > in his threads Query optimizer 8.0.1 and "One Big trend > > vs multiple smaller trends" in hackers. > > actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you. > :) > > (I hate using my name on lists like this because of spammers) > > > > > * A test script demonstrating the issue also follows. > > > > * I think keeping one more stat per attribute in > > pg_stastic that could describe this behavior. > > > > > > Longer: > > > > > > If I understand the optimizer correctly, correlation is used > > to both guess how much random disk access will be required in > > a query; as well as estimate how many pages will be read. > > > > Unfortunately, many tables in my larger databases have > > columns with values that are tightly packed on a few pages; > > even though there is no total-ordering across the whole table. > > Stephan Szabo described this as a "clumping effect": > > > http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php > > Yes. > > I think we are describing the exact same issue. >
I believe that is a very common case. Many natural value distributions follow the Zipf distribution, which is commonly known as the 80-20 curve. Best Regards, SImon Riggs ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend