rubenada commented on code in PR #3277:
URL: https://github.com/apache/calcite/pull/3277#discussion_r1241254889


##########
core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java:
##########
@@ -865,6 +866,9 @@ private static void matchFilter(SubQueryRemoveRule rule,
           LogicVisitor.find(RelOptUtil.Logic.TRUE, ImmutableList.of(c), e);
       final Set<CorrelationId>  variablesSet =
           RelOptUtil.getVariablesUsed(e.rel);
+      if (!filterVariablesSet.isEmpty()) {
+        variablesSet.retainAll(filterVariablesSet);

Review Comment:
   @libenchao I have the impression that `if (!filterVariablesSet.isEmpty()) { 
variablesSet.retainAll(filterVariablesSet); }` it's better than the current 
code, since fixes some issues (like the current one), but the proper fix would 
be just `variablesSet.retainAll(filterVariablesSet);` (without the `if` check).
   When applying the fix without the `if` I got one regression in `misc.iq`, 
but this is something that already appeared in 
https://github.com/apache/calcite/pull/3193 and that PR contains the fix for it 
(see [this 
comment](https://github.com/apache/calcite/pull/3193#discussion_r1202473255))
   
   I also found one new test case that is currently broken, and will only be 
fixed if we apply the patch without the ìf`: 
   ```
   select deptno from dept d1 where exists (
    select 1 from dept d2 where d2.deptno = d1.deptno and exists (
     select 1 from dept d3 where d3.dname = d1.dname));
   ```
   It is a bit "forced", because the query can be easily re-written to remove 
the inner-most subquery (which does not require to be 2-level nested); but it 
is a valid (although rare) scenario, which should ideally work (but since it is 
a bit far-fectched and it has a straightforward workaround by rewriting the 
query, I didn't put too much attention on it).
   
   The thing is, and it seems you and I both reached the same conclusion (as 
you explain 
[here](https://github.com/apache/calcite/pull/3193#discussion_r1204993079)), if 
we apply the patch without the `if`, I fear we can lead to some issues on 
downstream projects, because the logic to initialize the variableSet of Filter 
is a bit fragile, and downstream projects using RelBuilder API can create a 
(wrong, incomplete) Filter that has a condition with RexSubQuery with a 
correlation variable, without setting this variable in the Filter's 
variableSet. Thus, when the `filterVariablesSet` is not empty, we know that is 
was correctly initialized, so we can exploit that info; but when it is empty, 
we can never know if it is correct (i.e. it is empty because it is supposed to 
be empty), or it is incorrect (it is supposed to be **not** empty, but the 
Filter was "wrongly" created without a variableSet). So, at that point, if we 
exploit that info (e.g. by applying the fix without the `if`), things c
 an't stop working in the second case.
   That's why I thought it was safer to include the "partial fix" (retain only 
if `filterVariablesSet` is not empty), even though it will not fix certain 
cases (like the SQL query that I mentioned earlier, or the "last test" that you 
mention on your branch, which actually comes from a different ticket 
[CALCITE-5716](https://issues.apache.org/jira/browse/CALCITE-5716) ).
   
   I would propose (given than 1.35 RC phase is approaching) a conservative 
approach:
   - I can close this PR & ticket, because it is superseded by 
https://github.com/apache/calcite/pull/3193 (we could perhaps add my new test 
in `sub-query.iq` from this PR into that PR, just for completeness).
   - To avoid regressions on downstream projects, 
https://github.com/apache/calcite/pull/3193 would apply the fix with the `if`; 
that should fix 
[CALCITE-5683](https://issues.apache.org/jira/browse/CALCITE-5683) (and my 
ticket [CALCITE-5789](https://issues.apache.org/jira/browse/CALCITE-5789)); but 
it shall not contain the "last test in sub-query.iq` (which would be broken 
because AFAIK it requires the fix without the `if`).
   - In any case, that test actually belongs to a different (related) issue: 
[CALCITE-5716](https://issues.apache.org/jira/browse/CALCITE-5716). That ticket 
was closed, but IMO it should be re-open; and we would need to re-evaluate 
whether or not fix it (and the risks it may provoke) by removing the `if` from 
the patch (in a future version).
   
   wdyt? (sorry for the long message)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to