This topic has been discussed several times in the past. Due to the semantic mismatch regarding NULL handling, NOT IN is not ordinarily equivalent to an anti-join. However, if we can prove that neither the outer expressions nor the subquery outputs can yield NULL values, it should be safe to convert NOT IN to an anti-join.
I believe we are now in a much better position to attempt this again. The planner has accumulated significant infrastructure that makes this proof straightforward and reliable. Specifically, we can now leverage the outer-join-aware-Var infrastructure to tell whether a Var comes from the nullable side of an outer join, and the not-null-attnums hash table to efficiently check whether a Var is defined NOT NULL. We also have the expr_is_nonnullable() function that is smart enough to deduce non-nullability for expressions more complex than simple Vars/Consts. Attached is a draft patch for this attempt (part of the code is adapted from an old patch [1] by David and Tom). This patch aims for a conservative implementation: the goal is not to handle every theoretical case, but to handle canonical query patterns with minimal code complexity. The patch primarily targets patterns like: SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users); ... and SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users WHERE user_id IS NOT NULL); This is a very typical syntax for exclusion. In well-modeled databases, join keys like id and user_id are very likely to be defined as NOT NULL. It seems to me that the ROI here is quite positive: the added code complexity is very low (thanks to the existing infrastructure), while the benefit is that users writing this typical pattern will finally get efficient anti-join plans without needing manual rewrites. (For the outer expressions, we could potentially also use outer query quals to prove non-nullability. This patch does not attempt to do so. Implementing this would require passing state down during the pull_up_sublinks recursion; and given that find_nonnullable_vars can fail to prove non-nullability in many cases due to the lack of const-simplification at this stage, I'm not sure whether it is worth the code complexity. Besides, I haven't fully convinced myself that doing this does not introduce correctness issues.) Any thoughts? [1] https://postgr.es/m/[email protected] - Richard
v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch
Description: Binary data
