I see, thanks. I'm looking into the source code of statistic part now, and I'm a little confused about the column "staop" presented in table pg_statistic, in the pg_statisitc.h, the comment says:
/* ---------------- * To allow keeping statistics on different kinds of datatypes, * we do not hard-wire any particular meaning for the remaining * statistical fields. Instead, we provide several "slots" in which * statistical data can be placed. Each slot includes: * kind integer code identifying kind of data (see below) * op OID of associated operator, if needed * numbers float4 array (for statistical values) * values anyarray (for representations of data values) * The ID and operator fields are never NULL; they are zeroes in an * unused slot. The numbers and values fields are NULL in an unused * slot, and might also be NULL in a used slot if the slot kind has * no need for one or the other. * ---------------- */ And, //line 194 : In a "most common values" slot, staop is the OID of the "=" operator used to decide whether values are the same or not. //line 206 : A "histogram" slot describes the distribution of scalar data. staop is the OID of the "<" operator that describes the sort ordering. .... I don't understand the function of staop here, how is it used in optimizer, is there any example ? thanks! 2014/1/10 Amit Langote <amitlangot...@gmail.com> > On Fri, Jan 10, 2014 at 11:19 PM, Atri Sharma <atri.j...@gmail.com> wrote: > > > > > > Sent from my iPad > > > > On 10-Jan-2014, at 19:42, "ygnhzeus" <ygnhz...@gmail.com> wrote: > > > > Thanks for your reply. > > So correlation is not related to the calculation of selectivity right? > If I > > force PostgreSQL not to optimize the join order (by setting > > join_collapse_limit and from_collapse_limit to 1) , is there any other > > factor that may affect the structure of execution plan regardless of the > > data access method. > > > > 2014-01-10 > > ________________________________ > > ygnhzeus > > ________________________________ > > 发件人:Amit Langote <amitlangot...@gmail.com> > > 发送时间:2014-01-10 22:00 > > 主题:Re: [GENERAL] How to specify/mock the statistic data of tables in > > PostgreSQL > > 收件人:"ygnhzeus"<ygnhz...@gmail.com> > > 抄送:"pgsql-general"<pgsql-general@postgresql.org> > > > > > > > > AFAIK, correlation is involved in calculation of the costs that are used > for > > deciding the type of access.If the correlation is low, index scan can > lead > > to quite some random reads, hence leading to higher costs. > > > > Ah, I forgot to mention this point about how planner uses correlation > for access method selection. > > And selectivity is a function of statistical distribution of column > values described in pg_statistic by histograms, most common values > (with their occurrence frequencies), number of distinct values, etc. > It has nothing to do with correlation. > > -- > Amit Langote >