Robert Haas <robertmh...@gmail.com> writes: > The relevant code is in neqsel(). It estimates the fraction of rows > that will be equal, and then does 1 - that number. Evidently, the > query planner thinks that l1.l_suppkey = l2.l_suppkey would almost > always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost > always be false. I think the presumed selectivity of l1.l_suppkey = > l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little > puzzled by that function is managing to produce a selectivity estimate > of, essentially, 1.
No, I believe it's going through neqjoinsel and thence to eqjoinsel_semi. This query will have been flattened into a semijoin. I can reproduce a similarly bad estimate in the regression database: regression=# explain select * from tenk1 a where exists(select * from tenk1 b where a.thousand = b.thousand and a.twothousand <> b.twothousand); QUERY PLAN ------------------------------------------------------------------------- Hash Semi Join (cost=583.00..1067.25 rows=1 width=244) Hash Cond: (a.thousand = b.thousand) Join Filter: (a.twothousand <> b.twothousand) -> Seq Scan on tenk1 a (cost=0.00..458.00 rows=10000 width=244) -> Hash (cost=458.00..458.00 rows=10000 width=8) -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=8) (6 rows) The problem here appears to be that we don't have any MCV list for the "twothousand" column (because it has a perfectly flat distribution), and the heuristic that eqjoinsel_semi is using for the no-MCVs case is falling down badly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers