[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery
cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1181935798 After more thoughts, I think we should treat correlated subquery as a join in optimizer rules. So in this case, once we remove the `Project`, the plan becomes invalid, because the subquery's outer reference, which will be in the join condition, becomes ambiguous. I think your inital approach is the right direction. But let's make it more precise. We should only keep the `Project`, if: 1. the filter condition contains correlated subqueries 2. the subquery's outer references exists in both join sides if we remove the project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery
cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1180582242 OK I think `DeduplicateRelations` needs some fix. Ideally the outer and inner plan should not have conflicting output attributes after analysis, but this local relation + project case is missed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery
cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1180579396 > This check is not accurate when there's And expression in the Join condition as in this case. Hence, this PR proposes to add a check whether the intersected attributes exist in all the children of the And expression. Can you explain the rationale? The subquery filter will be turned in to join eventually so it's not very clear to me how to resolve `a#266` in the join condition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org