rubenada commented on code in PR #3277:
URL: https://github.com/apache/calcite/pull/3277#discussion_r1241254889
##########
core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java:
##########
@@ -865,6 +866,9 @@ private static void matchFilter(SubQueryRemoveRule rule,
LogicVisitor.find(RelOptUtil.Logic.TRUE, ImmutableList.of(c), e);
final Set<CorrelationId> variablesSet =
RelOptUtil.getVariablesUsed(e.rel);
+ if (!filterVariablesSet.isEmpty()) {
+ variablesSet.retainAll(filterVariablesSet);
Review Comment:
@libenchao I have the impression that `if (!filterVariablesSet.isEmpty()) {
variablesSet.retainAll(filterVariablesSet); }` it's better than the current
code, since fixes some issues (like the current one), but the proper fix would
be just `variablesSet.retainAll(filterVariablesSet);` (without the `if` check).
When applying the fix without the `if` I got one regression in `misc.iq`,
but this is something that already appeared in
https://github.com/apache/calcite/pull/3193 and that PR contains the fix for it
(see [this
comment](https://github.com/apache/calcite/pull/3193#discussion_r1202473255))
I also found one new test case that is currently broken, and will only be
fixed if we apply the patch without the ìf`:
```
select deptno from dept d1 where exists (
select 1 from dept d2 where d2.deptno = d1.deptno and exists (
select 1 from dept d3 where d3.dname = d1.dname));
```
It is a bit "forced", because the query can be easily re-written to remove
the inner-most subquery (which does not require to be 2-level nested); but it
is a valid (although rare) scenario, which should ideally work (but since it is
a bit far-fectched and it has a straightforward workaround by rewriting the
query, I didn't put too much attention on it).
The thing is, and it seems you and I both reached the same conclusion (as
you explain
[here](https://github.com/apache/calcite/pull/3193#discussion_r1204993079)), if
we apply the patch without the `if`, I fear we can lead to some issues on
downstream projects, because the logic to initialize the variableSet of Filter
is a bit fragile, and downstream projects using RelBuilder API can create a
(wrong, incomplete) Filter that has a condition with RexSubQuery with a
correlation variable, without setting this variable in the Filter's
variableSet. Thus, when the `filterVariablesSet` is not empty, we know that is
was correctly initialized, so we can exploit that info; but when it is empty,
we can never know if it is correct (i.e. it is empty because it is supposed to
be empty), or it is incorrect (it is supposed to be **not** empty, but the
Filter was "wrongly" created without a variableSet). So, at that point, if we
exploit that info (e.g. by applying the fix without the `if`), things c
an't stop working in the second case.
That's why I thought it was safer to include the "partial fix" (retain only
if `filterVariablesSet` is not empty), even though it will not fix certain
cases (like the SQL query that I mentioned earlier, or the "last test" that you
mention on your branch, which actually comes from a different ticket
[CALCITE-5716](https://issues.apache.org/jira/browse/CALCITE-5716) ).
I would propose (given than 1.35 RC phase is approaching) a conservative
approach:
- I can close this PR & ticket, because it is superseded by
https://github.com/apache/calcite/pull/3193 (we could perhaps add my new test
in `sub-query.iq` from this PR into that PR, just for completeness).
- To avoid regressions on downstream projects,
https://github.com/apache/calcite/pull/3193 would apply the fix with the `if`;
that should fix
[CALCITE-5683](https://issues.apache.org/jira/browse/CALCITE-5683) (and my
ticket [CALCITE-5789](https://issues.apache.org/jira/browse/CALCITE-5789)); but
it shall not contain the "last test in sub-query.iq` (which would be broken
because AFAIK it requires the fix without the `if`).
- In any case, that test actually belongs to a different (related) issue:
[CALCITE-5716](https://issues.apache.org/jira/browse/CALCITE-5716). That ticket
was closed, but IMO it should be re-open; and we would need to re-evaluate
whether or not fix it (and the risks it may provoke) by removing the `if` from
the patch (in a future version).
wdyt? (sorry for the long message)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]