[ 
https://issues.apache.org/jira/browse/SPARK-36113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824521#comment-17824521
 ] 

Jack Chen edited comment on SPARK-36113 at 3/7/24 8:09 PM:
-----------------------------------------------------------

The tricky aspect of this is that if we do the count bug handling in 
DecorrelateInnerQuery, we end up adding an extra outer join for scalar 
subqueries with count. This is because scalar subqueries always require a left 
join anyway (even without count) because if there's no data coming out of the 
subquery we still need to keep the row. And if we handle count bug in 
DecorrelateInnerQuery using the existing logic, it would create another left 
join, which may be in the middle of the subquery plan.

I think to solve this we want to check whether this is a count agg at the top 
of the scalar subquery, and specially handle that case to generate a single 
outer join. Adding that outer join still needs to happen in the same place 
(rewriteDomainJoins), but perhaps we can move the logic of checking whether the 
subquery is affected by count bug all to DecorrelateInnerQuery, and that just 
leaves a flag for rewriteDomainJoins later on.


was (Author: JIRAUSER299035):
The tricky aspect of this is that if we do the count bug handling in 
DecorrelateInnerQuery, we end up adding an extra outer join for scalar 
subqueries with count. This is because scalar subqueries always require a left 
join anyway (even without count) because if there's no data coming out of the 
subquery we still need to keep the row. And if we handle count bug in 
DecorrelateInnerQuery using the existing logic, it would create another left 
join, which may be in the middle of the subquery plan.

I think to solve this we want to check whether this is a count agg at the top 
of the scalar subquery, and specially handle that case to generate a single 
outer join.

> Unify the logic to handle COUNT bug for scalar and lateral subqueries
> ---------------------------------------------------------------------
>
>                 Key: SPARK-36113
>                 URL: https://issues.apache.org/jira/browse/SPARK-36113
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Allison Wang
>            Priority: Major
>
> Currently, for scalar subqueries, the logic to handle the count bug is in 
> `RewriteCorrelatedScalarSubquery` (constructLeftJoins) while lateral 
> subqueries rely on `DecorrelateInnerQuery` to handle the COUNT bug. 
> We should unify them without introducing additional left outer joins for 
> scalar subqueries.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to