Justin Pryzby <pry...@telsasoft.com> writes: > I dug into this some more; I can mitigate the issue with this change:
> diff --git a/src/backend/utils/adt/selfuncs.c > b/src/backend/utils/adt/selfuncs.c > index 6a4f7b1..962a5b4 100644 > --- a/src/backend/utils/adt/selfuncs.c > +++ b/src/backend/utils/adt/selfuncs.c > @@ -2279,6 +2279,22 @@ eqjoinsel_inner(Oid operator, > nd1 = get_variable_numdistinct(vardata1, &isdefault1); > nd2 = get_variable_numdistinct(vardata2, &isdefault2); > + elog(DEBUG4, "nd %lf %lf", nd1 ,nd2); > + if (nd1>vardata1->rel->rows) nd1=vardata1->rel->rows; > + if (nd2>vardata1->rel->rows) nd2=vardata2->rel->rows; > + > + elog(DEBUG4, "nd %lf %lf", nd1 ,nd2); > + elog(DEBUG4, "rows %lf %lf", vardata1->rel->rows > ,vardata2->rel->rows); > + elog(DEBUG4, "tuples %lf %lf", vardata1->rel->tuples > ,vardata2->rel->tuples); I don't like this change too much. I agree that intuitively you would not expect the number of distinct values to exceed the possibly-restricted number of rows from the input relation, but I think this falls foul of the problem mentioned in eqjoinsel_semi's comments, namely that it's effectively double-counting the restriction selectivity. It happens to improve matters in the test case you show, but it's not exactly producing a good estimate even so; and the fact that the change is in the right direction seems like mostly an artifact of particular ndistinct and rowcount values. I note for instance that this patch would do nothing at all for the toy example you posted upthread, because nd1/nd2 are already equal to the rows estimates in that case. The core reason why you get good results for select * from a join b using (x) where x = constant is that there's a great deal of intelligence in the planner about transitive equality deductions and what to do with partially-redundant equality clauses. The reason you don't get similarly good results for select * from a join b using (x) where x < constant is that there is no comparable machinery for inequalities. Maybe there should be, but it'd be a fair bit of work to create, and we'd have to keep one eye firmly fixed on whether it slows planning down even in cases where no benefit ensues. In the meantime, I'm not sure that there are any quick-hack ways of materially improving the situation :-( regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance