[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-12 Thread GitBox


cloud-fan commented on PR #37074:
URL: https://github.com/apache/spark/pull/37074#issuecomment-1181935798

   After more thoughts, I think we should treat correlated subquery as a join 
in optimizer rules. So in this case, once we remove the `Project`, the plan 
becomes invalid, because the subquery's outer reference, which will be in the 
join condition, becomes ambiguous.
   
   I think your inital approach is the right direction. But let's make it more 
precise. We should only keep the `Project`, if:
   1. the filter condition contains correlated subqueries
   2. the subquery's outer references exists in both join sides if we remove 
the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-11 Thread GitBox


cloud-fan commented on PR #37074:
URL: https://github.com/apache/spark/pull/37074#issuecomment-1180582242

   OK I think `DeduplicateRelations` needs some fix. Ideally the outer and 
inner plan should not have conflicting output attributes after analysis, but 
this local relation + project case is missed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-11 Thread GitBox


cloud-fan commented on PR #37074:
URL: https://github.com/apache/spark/pull/37074#issuecomment-1180579396

   > This check is not accurate when there's And expression in the Join 
condition as in this case. Hence, this PR proposes to add a check whether the 
intersected attributes exist in all the children of the And expression.
   
   Can you explain the rationale? The subquery filter will be turned in to join 
eventually so it's not very clear to me how to resolve `a#266` in the join 
condition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org