On 3/24/21 7:24 AM, Justin Pryzby wrote: > Most importantly, it looks like this forgets to update catalog documentation > for stxexprs and stxkind='e' >
Good catch. > It seems like you're preferring to use pluralized "statistics" in a lot of > places that sound wrong to me. For example: >> Currently the first statistics wins, which seems silly. > I can write more separately, but I think this is resolved and clarified if you > write "statistics object" and not just "statistics". > OK "statistics object" seems better and more consistent. >> + Name of schema containing table > > I don't know about the nearby descriptions, but this one sounds too much like > a > "schema-containing" table. Say "Name of the schema which contains the table" > ? > I think the current spelling is OK / consistent with the other catalogs. >> + Name of table > > Say "name of table on which the extended statistics are defined" > I've used "Name of table the statistics object is defined on". >> + Name of extended statistics > > "Name of the extended statistic object" > >> + Owner of the extended statistics > > ..object > OK >> + Expression the extended statistics is defined on > > I think it should say "the extended statistic", or "the extended statistics > object". Maybe "..on which the extended statistic is defined" > OK >> + of random access to the disk. (This expression is null if the >> expression >> + data type does not have a <literal><</literal> operator.) > > expression's data type > OK >> + much-too-small row count estimate in the first two queries. Moreover, the > > maybe say "dramatically underestimates the rowcount" > I've changed this to "... results in a significant underestimate of row count". >> + planner has no information about relationship between the expressions, >> so it > > the relationship > OK >> + assumes the two <literal>WHERE</literal> and <literal>GROUP BY</literal> >> + conditions are independent, and multiplies their selectivities together >> to >> + arrive at a much-too-high group count estimate in the aggregate query. > > severe overestimate ? > OK >> + This is further exacerbated by the lack of accurate statistics for the >> + expressions, forcing the planner to use default ndistinct estimate for >> the > > use *a default > OK >> + expression derived from ndistinct for the column. With such statistics, >> the >> + planner recognizes that the conditions are correlated and arrives at much >> + more accurate estimates. > > are correlated comma > OK >> + if (type->lt_opr == InvalidOid) > > These could be !OidIsValid > Maybe, but it's like this already. I'll leave this alone and then fix/backpatch separately. >> + * expressions. It's either expensive or very easy to defeat for >> + * determined used, and there's no risk if we allow such statistics (the >> + * statistics is useless, but harmless). > > I think it's meant to say "for a determined user" ? > Right. >> + * If there are no simply-referenced columns, give the statistics an >> auto >> + * dependency on the whole table. In most cases, this will be >> redundant, >> + * but it might not be if the statistics expressions contain no Vars >> + * (which might seem strange but possible). >> + */ >> + if (!nattnums) >> + { >> + ObjectAddressSet(parentobject, RelationRelationId, relid); >> + recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO); >> + } > > Can this be unconditional ? > What would be the benefit? This behavior copied from index_create, so I'd prefer keeping it the same for consistency reason. Presumably it's like that for some reason (a bit of cargo cult programming, I know). >> + * Translate the array of indexs to regular attnums for the dependency >> (we > > sp: indexes > OK >> + * Not found a matching expression, so >> we can simply skip > > Found no matching expr > OK >> + /* if found a matching, */ > > matching .. > Matching dependency. >> +examine_attribute(Node *expr) > > Maybe you should rename this to something distinct ? So it's easy to add a > breakpoint there, for example. > What would be a better name? It's not difficult to add a breakpoint using line number, for example. >> + stats->anl_context = CurrentMemoryContext; /* XXX should be using >> + >> * something else? */ > >> + bool nulls[Natts_pg_statistic]; > ... >> + * Construct a new pg_statistic tuple >> + */ >> + for (i = 0; i < Natts_pg_statistic; ++i) >> + { >> + nulls[i] = false; >> + } > > Shouldn't you just write nulls[Natts_pg_statistic] = {false}; > or at least: memset(nulls, 0, sizeof(nulls)); > Maybe, but it's a copy of what update_attstats() does, so I prefer keeping it the same. >> + * We don't store collations used to build the >> statistics, but >> + * we can use the collation for the attribute >> itself, as >> + * stored in varcollid. We do reset the >> statistics after a >> + * type change (including collation change), so >> this is OK. We >> + * may need to relax this after allowing >> extended statistics >> + * on expressions. > > This text should be updated or removed ? > Yeah, the last sentence is obsolete. Updated. >> @@ -2705,7 +2705,108 @@ describeOneTableDetails(const char *schemaname, >> } >> >> /* print any extended statistics */ >> - if (pset.sversion >= 100000) >> + if (pset.sversion >= 140000) >> + { >> + printfPQExpBuffer(&buf, >> + "SELECT oid, " >> + >> "stxrelid::pg_catalog.regclass, " >> + >> "stxnamespace::pg_catalog.regnamespace AS nsp, " >> + "stxname,\n" >> + >> "pg_get_statisticsobjdef_columns(oid) AS columns,\n" >> + " 'd' = any(stxkind) >> AS ndist_enabled,\n" >> + " 'f' = any(stxkind) >> AS deps_enabled,\n" >> + " 'm' = any(stxkind) >> AS mcv_enabled,\n"); >> + >> + if (pset.sversion >= 130000) >> + appendPQExpBufferStr(&buf, " stxstattarget\n"); >> + else >> + appendPQExpBufferStr(&buf, " -1 AS >> stxstattarget\n"); > > >= 130000 is fully determined by >= 14000 :) > Ah, right. >> + * type of the opclass, which is not interesting for our purposes. >> (Note: >> + * if we did anything with non-expression index columns, we'd need to > > index is wrong ? > Fixed > I mentioned a bunch of other references to "index" and "predicate" which are > still around: > Whooops, sorry. Fixed. I'll post a cleaned-up version of the patch addressing Dean's review comments too. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company