On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane <[email protected]> wrote: > I don't think it does really. The thing about a <> semijoin is that it > will succeed unless *every* join key value from the inner query is equal > to the outer key value (or is null). That's something we should consider > to be of very low probability typically, so that the <> selectivity should > be estimated as nearly 1.0. If the regular equality selectivity > approaches 1.0, or when there are expected to be very few rows out of the > inner query, then maybe the <> estimate should start to drop off from 1.0, > but it surely doesn't move linearly with the equality selectivity.
Ok, here I go like a bull in a china shop: please find attached a draft patch. Is this getting warmer? In the comment for JOIN_SEMI I mentioned a couple of refinements I thought of but my intuition was that we don't go for such sensitive and discontinuous treatment of stats; so I made the simplifying assumption that RHS always has more than 1 distinct value in it. Anti-join <> returns all the nulls from the LHS, and then it only returns other LHS rows if there is exactly one distinct non-null value in RHS and it happens to be that one. But if we make the same assumption I described above, namely that there are always at least 2 distinct values on the RHS, then the join selectivity is just nullfrac. -- Thomas Munro http://www.enterprisedb.com
neqjoinsel-fix-v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
