For decisions behind Correlate not being a "join", please examine this thread: http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB=je-h7awehbkzjrrhd-yczgkgwzforalrz_mmc2k7wddj...@mail.gmail.com%3E
Key point is "although correlate might look like a join, it is a weird one, so it is not that good to throw that complexity into unprepared user of Join.class". Almost the same goes for SemiJoin, and it is believed that "anybody would check .isSemiJoin() before use of a Join". P1) >>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCABivua%3DHg5TUop3W%2BGJ2waJ_76%2BL0UCuKV-sS7U7gVE137%3DOTQ%40mail.gmail.com%3E Tue, 18 Nov 2014 16:38:12 GMT Julian: Everyone who is handed a RelNode needs to understand exactly what they are getting. If they are only expecting a regular join we can't hand them a weird join such as a semi-join. That implies that weird relational expressions become their own sub-classes and need to have special rules written for them. <<< If you think carefully, "semi-join" is indeed a weird thing. At least that was my conclusion. It might look like a join, but you can't blindly apply all the join rules to semi-join. Exactly the same thing with correlations. Although it looks like a join, it is a weird join. In other words, Liskov Substitution Principle does not hold. P2) >>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB=je-ftbds_u+gbtrsvk0w4pv_o8gdvkuwzmqdqkqotsp0...@mail.gmail.com%3E Tue, 18 Nov 2014 18:45:13 GMT There I give an example when a rule would return wrong data if it happens to see SemiJoin where it expects a "Join" <<< >Correlate would also need to have the 'condition' >to represent a join condition because the FilterJoinRule relies on placing >the join condition on the join node during filter push down Please, explain what you mean by saying "join condition" for Correlate. Correlate just executes second node for each row of the first node. There is no condition there. I do not think we should add an artificial condition here and there just to please a couple of rules. If you want filter the results of correlation, why don't you just use Filter(Correlate)? >2. The redundant code in #1 can be mitigated by creating base classes for >some of the rules I think this might work. > use 'BiRel' instead of 'Join' since BiRel is the base class for both >Join and Correlate. It might be better to use some kind of "or" or "list of classes" RelOptRuleOperand so the rules makes explicit what it wants if this change would ever accepted. >3. Modify Correlate to have JoinType, SemiJoinType as well as 'condition'. org.apache.calcite.sql.SemiJoinType#toJoinType converts SemiJoinType to JoinType. Why do you need adding JoinType to Correlate? Regarding condition, see above. How those "additional methods" on Correlate would help you? Do you plan adding "WeirdJoin" interface that would be implemented by Correlate and Join? Otherwise there would be no much sense in having the same method in different class hierarchies. Vladimir
