On Fri, Apr 29, 2022 at 12:31 PM Tom Lane <[email protected]> wrote:
> "David G. Johnston" <[email protected]> writes: > > The fact that (baserel.rows > path->subpath->rows) here seems like a > > straight bug: there are no filters involved in this case but in the > > presence of filters baserel->rows should be strictly (<= > > path->subpath->rows), right? > > No, because the subpath's rowcount has been derated to represent the > number of rows any one worker is expected to process. Since the > SubqueryScan is below the Gather, it has to represent that number > of rows too. Like I said, this design is pretty confusing; though > I do not have a better alternative to suggest. > > Anyway, after chewing on this for awhile, it strikes me that the > sanest way to proceed is for cost_subqueryscan to compute its rows > estimate as the subpath's rows estimate times the selectivity of > the subqueryscan's quals (if any). This is what Robert was getting at, and I followed-up on. The question I ended up at is why doesn't baserel->rows already produce the value you now propose to calculate directly within cost_subquery set_baserel_size_estimates (multiplies rel->tuples - which is derated - by selectivity, sets rel->rows) set_subquery_size_estimates rel->subroot = subquery_planner(...) // my expectation is that sub_final_rel->cheapest_total_path->rows is the derated number of rows; // the fact you can reach the derated amount later by using path->subpath->rows seems to affirm this. sets rel->tuples from sub_final_rel->cheapest_total_path->rows) set_subquery_pathlist (executes the sizing call stack above, then proceeds to create_subqueryscan_path which in turn calls cost_subquery) set_rel_size ... > * By analogy to other sorts of relation-scan nodes. But those don't > have any subpath that they could consult instead. This analogy is > really a bit faulty, since SubqueryScan isn't a relation scan node > in any ordinary meaning of that term. > I did observe this copy-and-paste dynamic; I take it this is why we cannot or would not just change all of the baserel->rows usages to path->subpath->rows. > > So perhaps we should do it more like the attached, which produces > this plan for the UNION case: > > The fact this changes row counts in a costing function is bothersome - it would be nice to go the other direction and remove the if block. More generally, path->path.rows should be read-only by the time we get to costing. But I'm not out to start a revolution here either. But it feels like we are just papering over a bug in how baserel->rows is computed; per my analysis above. In short, the solution seems like it should, and in fact does here, fix the observed problem. I'm fine with that. David J.
