[jira] [Updated] (SPARK-42776) invalid issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Affects Version/s: 2.4.8 (was: 3.3.1) > invalid issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Test > Components: Windows >Affects Versions: 2.4.8 >Reporter: Timothy Miller >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) invalid issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Issue Type: Test (was: Bug) > invalid issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Test > Components: Windows >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) invalid issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Priority: Trivial (was: Major) > invalid issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Windows >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) invalid issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Component/s: Windows (was: Optimizer) > invalid issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Windows >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) invalid issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Summary: invalid issue (was: deleted issue) > invalid issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42776) deleted issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller closed SPARK-42776. -- > deleted issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42776) deleted issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller resolved SPARK-42776. Resolution: Invalid > deleted issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) deleted issue
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Summary: deleted issue (was: BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules) > deleted issue > - > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Environment: (was: I'm prototyping on a Mac, but that's not really relevant.) > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776 ] Timothy Miller deleted comment on SPARK-42776: was (Author: JIRAUSER287471): A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions * Only after that can I replace BroadcastHashJoinExec with a columnar alternative, but by then it's too late. I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Miller updated SPARK-42776: --- Description: (was: I am trying to replace BroadcastHashJoinExec with a columnar equivalent. However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets called BEFORE the columnar replacement rules. As a result, the object that gets broadcast is the plain old hashmap created from row data. By the time the columnar replacement rules are applied, it's too late to get Spark to broadcast any other kind of object.) > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. >Reporter: Timothy Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700286#comment-17700286 ] Timothy Miller edited comment on SPARK-42776 at 3/14/23 4:34 PM: - A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions * Only after that can I replace BroadcastHashJoinExec with a columnar alternative, but by then it's too late. I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. was (Author: JIRAUSER287471): A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. >Reporter: Timothy Miller >Priority: Major > > I am trying to replace BroadcastHashJoinExec with a columnar equivalent. > However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets > called BEFORE the columnar replacement rules. As a result, the object that > gets broadcast is the plain old hashmap created from row data. By the time > the columnar replacement rules are applied, it's too late to get Spark to > broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700286#comment-17700286 ] Timothy Miller commented on SPARK-42776: A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. >Reporter: Timothy Miller >Priority: Major > > I am trying to replace BroadcastHashJoinExec with a columnar equivalent. > However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets > called BEFORE the columnar replacement rules. As a result, the object that > gets broadcast is the plain old hashmap created from row data. By the time > the columnar replacement rules are applied, it's too late to get Spark to > broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
Timothy Miller created SPARK-42776: -- Summary: BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules Key: SPARK-42776 URL: https://issues.apache.org/jira/browse/SPARK-42776 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.3.1 Environment: I'm prototyping on a Mac, but that's not really relevant. Reporter: Timothy Miller I am trying to replace BroadcastHashJoinExec with a columnar equivalent. However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets called BEFORE the columnar replacement rules. As a result, the object that gets broadcast is the plain old hashmap created from row data. By the time the columnar replacement rules are applied, it's too late to get Spark to broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org