Hi, folks

I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
Javadoc says, it returns variables that are set by current node:

  /**
   * Returns the variables that are set in this relational
   * expression but also used and therefore not available to parents of this
   * relational expression.
   *
   * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
   * variables.
   *
   * @return Names of variables which are set in this relational
   *   expression
   */
  Set<CorrelationId> getVariablesSet();


But I've got a plan where node returns all variables used by children nodes
regardless this variable are set by current or parent node.

Original query is:

SELECT *
  FROM t1 as "outer"
 WHERE a > (
       SELECT COUNT(*)
         FROM t1 as "inner"
        WHERE "inner".a IN (
              SELECT * 
                FROM table(system_range("inner".a, "inner".b + "outer".b))
        )
 )

After SQL to Rel translation I've got plan as follow:

LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
  LogicalFilter(condition=[>($2, $SCALAR_QUERY({
        LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
          LogicalFilter(condition=[IN($2, {
                LogicalProject(X=[$0])
                  LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, 
+($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
                })], variablesSet=[[$cor0]])
            LogicalTableScan(table=[[PUBLIC, T1]])
        }))], variablesSet=[[$cor2]])
    LogicalTableScan(table=[[PUBLIC, T1]])

Every LogicalFilter introduce its own correlation variable, and everything is
OK so far.

But then I apply SubQueryRemoveRule and new plan looks like this:

LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
  LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
    LogicalFilter(condition=[>($2, $7)])
      LogicalCorrelate(correlation=[$cor2], joinType=[left], 
requiredColumns=[{3}])
        LogicalTableScan(table=[[PUBLIC, T1]])
        LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
          LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], 
E=[$6])
            LogicalJoin(condition=[=($2, $7)], joinType=[inner])
              LogicalTableScan(table=[[PUBLIC, T1]])
              LogicalAggregate(group=[{0}])
                LogicalProject(X=[$0])
                  LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, 
+($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])


At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
variables which doesn't seem right.

Is such behaviour expected or it is a bug?

-- 
Regards,
Konstantin Orlov


Reply via email to