On Thu, 4 Mar 2021 at 22:16, Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > Attached is a slightly improved version of the patch series, addressing > most of the issues raised in the previous message.
Cool. Sorry for the delay replying. > 0003-Extended-statistics-on-expressions-20210304.patch > > Mostly unchanged, The one improvement is removing some duplicate code in > in mvc.c. > > 0004-WIP-rework-tracking-of-expressions-20210304.patch > > This is mostly unchanged of the patch reworking how we assign artificial > attnums to expressions (negative instead of (MaxHeapAttributeNumber+i)). Looks good. I see you undid the change to get_relation_statistics() in plancat.c, which offset the attnums of plain attributes in the StatisticExtInfo struct. I was going to suggest that as a simplification to the previous 0004 patch. Related to that, is this comment in dependencies_clauselist_selectivity(): /* * Count matching attributes - we have to undo two attnum offsets. * First, the dependency is offset using the number of expressions * for that statistics, and then (if it's a plain attribute) we * need to apply the same offset as above, by unique_exprs_cnt. */ which needs updating, since there is now just one attnum offset, not two. Only the unique_exprs_cnt offset is relevant now. Also, related to that change, I don't think that stat_covers_attributes() is needed anymore. I think that the code that calls it can now just be reverted back to using bms_is_subset(), since that bitmapset holds plain attributes that aren't offset. > 0005-WIP-unify-handling-of-attributes-and-expres-20210304.patch > > This reworks how we build statistics on attributes and expressions. > Instead of treating attributes and expressions separately, this allows > handling them uniformly. > > Until now, the various "build" functions (for different statistics > kinds) extracted attribute values from sampled tuples, but expressions > were pre-calculated in a separate array. Firstly to save CPU time (not > having to evaluate expensive expressions repeatedly) and to keep the > different stats consistent (there might be volatile functions etc.). > > So the build functions had to look at the attnum, determine if it's > attribute or expression, and in some cases it was tricky / easy to get > wrong. > > This patch replaces this "split" view with a simple "consistent" > representation merging values from attributes and expressions, and just > passes that to the build functions. There's no need to check the attnum, > and handle expressions in some special way, so the build functions are > much simpler / easier to understand (at least I think so). > > The build data is represented by "StatsBuildData" struct - not sure if > there's a better name. > > I'm mostly happy with how this turned out. I'm sure there's a bit more > cleanup needed (e.g. the merging/remapping of dependencies needs some > refactoring, I think) but overall this seems reasonable. Agreed. That's a nice improvement. I wonder if dependency_is_compatible_expression() can be merged with dependency_is_compatible_clause() to reduce code duplication. It probably also ought to be possible to support "Expr IN Array" there, in a similar way to the other code in statext_is_compatible_clause(). Also, should this check rinfo->clause_relids against the passed-in relid to rule out clauses referencing other relations, in the same way that statext_is_compatible_clause() does? > I did some performance testing, I don't think there's any measurable > performance degradation. I'm actually wondering if we need to transform > the AttrNumber arrays into bitmaps in various places - maybe we should > just do a plain linear search. We don't really expect many elements, as > each statistics has 8 attnums at most. So maybe building the bitmapsets > is a net loss? The one exception might be functional dependencies, where > we can "merge" multiple statistics together. But even then it'd require > many statistics objects to make a difference. Possibly. There's a danger in trying to change too much at once though. As it stands, I think it's fairly close to being committable, with just a little more tidying up. Regards, Dean