Another skim on 0002: reference.sgml is missing a call to &alterStatistic.
ObjectProperty[] contains a comment that the ACL is "same as relation", but is that still correct, given that now stats may be related to more than one relation? Do we even know what the rules for ACLs on cross-relation stats are? One very simple way to get around this is to dictate that all the rels must have the same owner. Perhaps we're not considering the multi-relation case yet? We have this FIXME comment in do_analyze_rel: + * FIXME This sample sizing is mostly OK when computing stats for + * individual columns, but when computing multi-variate stats + * for multivariate stats (histograms, mcv, ...) it's rather + * insufficient. For stats on multiple columns / complex stats + * we need larger sample sizes, because we need to build more + * detailed stats (more MCV items / histogram buckets) to get + * good accuracy. Maybe it'd be appropriate to use samples + * proportional to the table (say, 0.5% - 1%) instead of a + * fixed size might be more appropriate. Also, this should be + * bound to the requested statistics size - e.g. number of MCV + * items or histogram buckets should require several sample + * rows per item/bucket (so the sample should be k*size). Maybe this merits more discussion. Right now we have an upper bound on how much to scan for analyze; if we introduce the idea of scanning a percentage of the relation, the time to analyze very large relations could increase significantly. Do we have an idea of what to do for this? For instance, a rule that would make me comfortable would say to scan a sample 3x the current size when you have a mvstats on 3 columns; then the size of fraction to scan is still bounded. But does that actually work? From the wording of this comment, I assume you don't actually know. In this block (CreateStatistics) + /* look for duplicities */ + for (i = 0; i < numcols; i++) + for (j = 0; j < numcols; j++) + if ((i != j) && (attnums[i] == attnums[j])) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_COLUMN), + errmsg("duplicate column name in statistics definition"))); isn't it easier to have the inner loop go from i+1 to numcols? I wonder if this is sensible with multi-relation statistics: + /* + * Store a dependency too, so that statistics are dropped on DROP TABLE + */ + parentobject.classId = RelationRelationId; + parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel)); + parentobject.objectSubId = 0; + childobject.classId = MvStatisticRelationId; + childobject.objectId = statoid; + childobject.objectSubId = 0; I suppose the idea is to drop the stats if any of the rels they are for is dropped. Right after that you create a dependency on the schema. Is that necessary? Since you have the dependency on the relation, the stats would be dropped by recursion. Why are you #include'ing builtins.h everywhere? RelationGetMVStatList() needs a comment. Please get rid of common.h. It's totally unlike the way we structure our header files. We don't keep headers in src/backend; they're all in src/include. One reason is that the latter gets installed as a whole in include/server, which this file will not be. This file may be necessary to build some extensions in the future, for example. In mvstats.h, please mark function prototypes as "extern". Many files need a pgindent pass. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers