Re: [HACKERS] correlation in pg_stats

Simon Riggs Tue, 08 Feb 2005 13:39:21 -0800

>[EMAIL PROTECTED]@[EMAIL PROTECTED] wrote
> > Short summary:
> >
> >   * It looks to me like the planner vastly overestimates
> >     the # of pages read by index scan in quite a few of my
> >     tables even though stats collected by ANALYZE are correct.
> >
> >   * The problem happens any time you have multiple columns
> >     that have a number of repeated values in them, and
> >     you CLUSTER the table by a sort using both columns
> >     (like "city,state,zip,phone#" or "firstname,lastname").
> >
> >   * I think this is the problem that Mark Kirkwood is seeing
> >     in his threads Query optimizer 8.0.1 and "One Big trend
> >     vs multiple smaller trends" in hackers.
>
> actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you.
> :)
>
> (I hate using my name on lists like this because of spammers)
>
> >
> >   * A test script demonstrating the issue also follows.
> >
> >   * I think keeping one more stat per attribute in
> >     pg_stastic that could describe this behavior.
> >
> >
> > Longer:
> >
> >
> >   If I understand the optimizer correctly,  correlation is used
> >   to both guess how much random disk access will be required in
> >   a query; as well as estimate how many pages will be read.
> >
> >   Unfortunately, many tables in my larger databases have
> >   columns with values that are tightly packed on a few pages;
> >   even though there is no total-ordering across the whole table.
> >   Stephan Szabo described this as a "clumping effect":
> >
> http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php
>
> Yes.
>
> I think we are describing the exact same issue.
>


I believe that is a very common case.

Many natural value distributions follow the Zipf distribution, which is
commonly known as the 80-20 curve.

Best Regards, SImon Riggs


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [HACKERS] correlation in pg_stats

Reply via email to