[ 
https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700286#comment-17700286
 ] 

Timothy Miller edited comment on SPARK-42776 at 3/14/23 4:34 PM:
-----------------------------------------------------------------

A little more detail about the sequence events that cause this bug:
 * org.apache.spark.sql.execution.RemoveRedundantProjects is applied
 * that causes BroadcastHashJoinExec to get created
 * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied
 * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the 
hashmap object that gets broadcast
 * a few more rules are applied, followed by 
org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions
 * Only after that can I replace BroadcastHashJoinExec with a columnar 
alternative, but by then it's too late.

I can't find a way to inject extra rules into or between 
RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a 
workaround either.


was (Author: JIRAUSER287471):
A little more detail about the sequence events that cause this bug:
 * org.apache.spark.sql.execution.RemoveRedundantProjects is applied
 * that causes BroadcastHashJoinExec to get created
 * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied
 * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the 
hashmap object that gets broadcast
 * a few more rules are applied, followed by 
org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions

I can't find a way to inject extra rules into or between 
RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a 
workaround either.

> BroadcastHashJoinExec.requiredChildDistribution called before columnar 
> replacement rules
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-42776
>                 URL: https://issues.apache.org/jira/browse/SPARK-42776
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 3.3.1
>         Environment: I'm prototyping on a Mac, but that's not really relevant.
>            Reporter: Timothy Miller
>            Priority: Major
>
> I am trying to replace BroadcastHashJoinExec with a columnar equivalent. 
> However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets 
> called BEFORE the columnar replacement rules. As a result, the object that 
> gets broadcast is the plain old hashmap created from row data. By the time 
> the columnar replacement rules are applied, it's too late to get Spark to 
> broadcast any other kind of object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to