liuyongvs commented on code in PR #28533:
URL: https://github.com/apache/flink/pull/28533#discussion_r3471875019


##########
flink-table/flink-table-planner/src/test/resources/org/apache/flink/table/planner/plan/stream/sql/SubplanReuseTest.xml:
##########
@@ -926,11 +926,11 @@ LogicalIntersect(all=[false])
     </Resource>
     <Resource name="optimized exec plan">
       <![CDATA[
-Join(joinType=[LeftSemiJoin], where=[IS NOT DISTINCT FROM(random, random0)], 
select=[random], leftInputSpec=[JoinKeyContainsUniqueKey], 
rightInputSpec=[NoUniqueKey])
+Join(joinType=[LeftSemiJoin], where=[((random IS NULL AND random0 IS NULL) OR 
(random = random0))], select=[random], 
leftInputSpec=[JoinKeyContainsUniqueKey], rightInputSpec=[NoUniqueKey])

Review Comment:
   Root Cause Analysis of XML Test Differences
   The Difference
   After replacing Flink's ReplaceIntersectWithSemiJoinRule with Calcite's 
IntersectToSemiJoinRule, the join condition in XML test files changed from:
   
   IS NOT DISTINCT FROM($0, $1)
   to:
   
   OR(AND(IS NULL($0), IS NULL($1)), =($0, $1))
   Both forms are semantically equivalent.
   
   Root Cause: collapseExpandedIsNotDistinctFromExpr Fails Due to Asymmetric 
removeNullabilityCast
   The key difference between the two rules is how they construct the join 
condition, which determines whether RelBuilder.join() can collapse the expanded 
form back to IS NOT DISTINCT FROM.
   
   Old Rule (Flink) — Collapse Succeeds
   ReplaceIntersectWithSemiJoinRule calls 
SetOpRewriteUtil.generateEqualsCondition(), which constructs conditions using 
bare makeInputRef:
   
   // SetOpRewriteUtil.java:59-71
   RexNode leftRex  = rexBuilder.makeInputRef(leftTypes.get(key), key);
   RexNode rightRex = rexBuilder.makeInputRef(rightTypes.get(key), 
leftTypes.size() + key);
   relBuilder.or(
       rexBuilder.makeCall(EQUALS, leftRex, rightRex),       // =($0, $1)
       relBuilder.and(
           relBuilder.isNull(leftRex),                         // IS NULL($0)
           relBuilder.isNull(rightRex)));                      // IS NULL($1)
   When RelBuilder.join() is called, it internally invokes 
collapseExpandedIsNotDistinctFromExpr (RelBuilder.java:3198). The IS_NULL 
operands ($0, $1) and the EQUALS operands ($0, $1) are identical — 
removeNullabilityCast has no effect on bare InputRef nodes. The equality check 
in doCollapseExpandedIsNotDistinctFrom (RelOptUtil.java:1883) passes, and the 
condition is collapsed to IS NOT DISTINCT FROM($0, $1).
   
   New Rule (Calcite) — Collapse Fails
   IntersectToSemiJoinRule calls builder.isNotDistinctFrom(), which internally 
calls RelOptUtil.isDistinctFromInternal() (RelOptUtil.java:2417-2425). This 
generates:
   
   OR(AND(IS_NULL(CAST($0)), IS_NULL(CAST($1))),
      IS_TRUE(=(CAST($0), CAST($1))))
   The CAST nodes come from the rule's use of makeCast(leastFieldType, ...) 
(IntersectToSemiJoinRule.java:125-128) to align field types to the Intersect's 
leastRowType.
   
   When collapseExpandedIsNotDistinctFromExpr processes this condition, it 
reaches doCollapseExpandedIsNotDistinctFrom (RelOptUtil.java:1873-1890):
   
   // RelOptUtil.java:1875-1881
   final RexNode isNullInput0 = ifNull0Call.getOperands().get(0);     // 
CAST($0) — no transform
   final RexNode isNullInput1 = ifNull1Call.getOperands().get(0);     // 
CAST($1) — no transform
   
   final RexNode equalsInput0 = RexUtil
       .removeNullabilityCast(rexBuilder.getTypeFactory(), 
equalsCall.getOperands().get(0)); // $0 — CAST removed!
   final RexNode equalsInput1 = RexUtil
       .removeNullabilityCast(rexBuilder.getTypeFactory(), 
equalsCall.getOperands().get(1)); // $1 — CAST removed!
   removeNullabilityCast strips CAST nodes that only change nullability (e.g., 
CAST($0 AS INT NOT NULL) → $0). It is applied only to the EQUALS operands, not 
to the IS_NULL operands. This creates an asymmetry:
   
   isNullInput0 = CAST($0 AS type) (untouched)
   equalsInput0 = $0 (CAST stripped by removeNullabilityCast)
   Since CAST($0) ≠ $0, the equality check fails (RelOptUtil.java:1883-1884), 
and the collapse returns the original expression unchanged.
   
   After the failed collapse, simplifyUnknownAsFalse (RelBuilder.java:3201) 
takes over:
   
   IS_TRUE(=($0, $1)) → =($0, $1) (simplification removes IS_TRUE wrapper)
   Remaining CAST nodes are also simplified away
   The final output is: OR(AND(IS NULL($0), IS NULL($1)), =($0, $1))
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to