On Fri, 30 Sept 2022 at 13:06, Peter Geoghegan <[email protected]> wrote: > I like the idea of just avoiding unparameterized nested loop joins > altogether when an "equivalent" hash join plan is available because > it's akin to an execution-time mitigation, despite the fact that it > happens during planning. While it doesn't actually change anything in > the executor, it is built on the observation that we have virtually > everything to gain and nothing to lose during execution, no matter > what happens.
I'm not sure if it's a good idea to assume that performing non-parameterised Nested Loops when we shouldn't is the only shape of plan that causes us problems. We also have the case where we assume early start-up plans are favourable. For example: SELECT * FROM t WHERE a = 1 ORDER BY b LIMIT 10; where we have two indexes, one on t(a) and another on t(b). Should we use the t(b) index and filter out the rows that don't match a = 1 and hope we get 10 a=1 rows soon in the t(b) index? or do we use t(a) and then perform a sort? Best case for using the t(b) index is that we find 10 a=1 rows in the first 10 rows of the index scan, the worst case is that there are no rows with a=1. Having something coded into the cost model is a more generic way of addressing this issue. Providing we design the cost model correctly, we'd be able to address future issues we discover using which ever cost model infrastructure that we design for this. I understand that what you propose would be a fast way to fix this issue. However, if we went and changed the join path creation code to not add non-parameterised nested loop paths when other paths exist, then how could we ever dare to put that code back again when we come up with a better solution? David
