I’m trying to figure out how to attack the NOT IN bug, https://issues.apache.org/jira/browse/OPTIQ-373 (I thought I’d addressed it — see JdbcTest.testNotInQueryWithNull — but I wrote the test case wrong, and we do indeed have a problem.)
I’d like to translate NOT IN similarly to how we translate IN, EXISTS, NOT EXISTS and scalar sub-queries (i.e. to some combination of a join, probably an outer join, on top of an aggregate). Then have a rule that recognizes that pattern and translates to SemiJoin. That seems more roundabout than going straight to semi-join, but (1) it will allow NOT IN queries to run on back-ends that do not have an implementation of SemiJoin, and (2) it should allow us to do decorrelation, (3) it should allow us to represent cases that cannot be represented as semi-join, e.g. ‘(x NOT IN subquery) IS UNKNOWN OR otherCondition' There are a bunch of ways to translate the query where you first check then whether the sub-query returns NULL then, then run again using 2-valued boolean semantics. For example, the query SELECT * FROM d WHERE d NOT IN ( SELECT d FROM e) can be translated to SELECT * FROM d WHERE NOT EXISTS (SELECT 1 FROM e WHERE deptno IS NULL) AND deptno IN (SELECT deptno FROM e WHERE deptno IS NOT NULL) But in so doing, we have duplicated the sub-query. That places a big burden on the rule that would recognize the two identical sub-queries and convert to some variant of semi-join. So, is there a translation that only uses one copy of the query? Julian
