lincoln lee created FLINK-39720:
-----------------------------------

             Summary: SubQueryDecorrelator produces incorrect plans for 
correlated EXISTS with HAVING on aggregate outputs
                 Key: FLINK-39720
                 URL: https://issues.apache.org/jira/browse/FLINK-39720
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Planner
    Affects Versions: 2.2.1, 1.20.4, 2.3.0
            Reporter: lincoln lee
            Assignee: lincoln lee


SubQueryDecorrelator.decorrelateRel(LogicalFilter) reattaches the 
non-correlated remainder of a Filter condition to the rewritten input without 
remapping its
  RexInputRefs through frame.oldToNewOutputs. When the child LogicalAggregate 
has had correlated columns injected into its group key (which shifts the 
position of
  aggregate-output fields), surviving HAVING / Filter predicates silently point 
at the wrong column. The resulting plan is structurally valid but semantically 
wrong.

Reproduction

  Schema (matches SubQuerySemiJoinTest): l(a INT, b BIGINT, c VARCHAR), r(d 
INT, e BIGINT, f VARCHAR).

  SELECT * FROM l
  WHERE EXISTS (
    SELECT 1 FROM r
    WHERE l.a = r.d            -- correlated WHERE
    GROUP BY r.f
    HAVING SUM(r.e) >= 3       -- non-correlated HAVING on aggregate output
  );

  Expected: HAVING applies to the SUM(r.e) column.
  Actual (before fix): HAVING applies to the injected r.d group-key column 
(>=($1, 3) where $1 is now r.d, not SUM(r.e)). Plan is silently wrong.

  Other shapes that trigger the same drift:
  - Compound HAVING: HAVING SUM(r.e) >= 3 AND MAX(r.e) < 100
  - Mixed agg + COUNT: HAVING SUM(r.e) >= 3 AND COUNT(*) > 1
  - Multiple correlated cols: WHERE l.a = r.d AND l.b = r.e ... HAVING 
COUNT(r.d) >= 2

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to