On 13 March 2017 at 23:00, David Rowley <david.row...@2ndquadrant.com> wrote: > > 0003: > > No more time today. Will try and get to those soon. >
0003: I've now read this patch. My main aim here was to learn what it does and how it works. I need to spend much longer understanding how your calculating the functional dependencies. In the meantime I've pasted the notes I took while reading over the patch. + default: + elog(ERROR, "unexcpected statistics type requested: %d", type); "unexpected", but we generally use "unknown". @@ -1293,7 +1294,8 @@ get_relation_statistics(RelOptInfo *rel, Relation relation) info->rel = rel; /* built/available statistics */ - info->ndist_built = true; + info->ndist_built = stats_are_built(htup, STATS_EXT_NDISTINCT); + info->deps_built = stats_are_built(htup, STATS_EXT_DEPENDENCIES); I don't really like how this function is shaping up. You're calling stats_are_built() potentially twice for each stats type. There must be a nicer way to do this. Are non-built stats common enough to optimize building a StatisticExtInfo regardless and throwing it away if it happens to be useless? Can you also rename mvoid to become something more esoid or similar. I seem to always read it as m-void instead of mv-oid and naturally I expect a void pointer rather than an Oid. +dependencies, and for each one count the number of rows rows consistent it. duplicate word "rows" +Apllying the functional dependencies is fairly simple - given a list of Applying +In this case the default estimation based on AVIA principle happens to work hmm, maybe I should know what AVIA principles are, but I don't. Is there something I should be reading? I searched a bit around the internet for a few minutes it didn't seem have a great idea either. + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group 2017 + Assert(tmp <= ((char *) output + len)); Shouldn't you just Assert(tmp == ((char *) output + len)); at the end of the loop? + if (dependencies->magic != STATS_DEPS_MAGIC) + elog(ERROR, "invalid dependency magic %d (expected %dd)", + dependencies->magic, STATS_DEPS_MAGIC); + + if (dependencies->type != STATS_DEPS_TYPE_BASIC) + elog(ERROR, "invalid dependency type %d (expected %dd)", + dependencies->type, STATS_DEPS_TYPE_BASIC); %dd ? + Assert(dependencies->ndeps > 0); Why Assert() and not elog() ? Wouldn't think mean that a corrupt dependency could fail an Assert + dependencies = (MVDependencies) palloc0(sizeof(MVDependenciesData)); Why palloc0() and not palloc()? Can you not just read it into a variable on the stack, then check the exact size using tempdeps.ndeps * sizeof(MVDependency), then memcpy() it over? That'll save you the realloc() + /* what minimum bytea size do we expect for those parameters */ + expected_size = offsetof(MVDependenciesData, deps) + + dependencies->ndeps * (offsetof(MVDependencyData, attributes) + + sizeof(AttrNumber) * 2); Can't quite make sense of this yet. Why * 2? + /* is the number of attributes valid? */ + Assert((k >= 2) && (k <= STATS_MAX_DIMENSIONS)); Seems like a bad idea to Assert() this. Wouldn't some bad data being deserialized cause an Assert failure? + d = (MVDependency) palloc0(offsetof(MVDependencyData, attributes) + + (k * sizeof(AttrNumber))); Why palloc0(), you seem to write out all the fields right away. Seems like a waste to zero the memory. + /* still within the bytea */ + Assert(tmp <= ((char *) data + VARSIZE_ANY(data))); Any point? You're already Asserting that you've consumed the entire array at the end anyway. + appendStringInfoString(&str, "["); appendStringInfoChar(&str. '['); would be better. + ret = pstrdup(str.data); ret = pnstrdup(str.data, str.len); +CREATE STATISTICS s1 WITH (dependencies) ON (a,a) FROM functional_dependencies; +ERROR: duplicate column name in statistics definition Is it worth mentioning which column here? I'll try to spend more time understanding 0003 soon. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services