[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938243#comment-14938243 ] Reynold Xin commented on SPARK-3862: David, Thanks. Let's chat there. Since I created the ticket, I have new thoughts on how we can make something better with codegen, rather than writing specialized operators. > MultiWayBroadcastInnerHashJoin > -- > > Key: SPARK-3862 > URL: https://issues.apache.org/jira/browse/SPARK-3862 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > It is common to have a single fact table inner join many small dimension > tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin > (or maybe just MultiwayDimensionJoin) operator that optimizes for this > pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935054#comment-14935054 ] David Sabater commented on SPARK-3862: -- Thanks Reynold - I am really interested in this feature and happy to contribute in whatever format, I see there is a Jira task opened around that actually! https://issues.apache.org/jira/browse/SPARK-3863 Let me know how I can contribute please, I am actually attending Spark Summit EU so we may see each other to talk about potential use cases and ways to collaborate. Regards. > MultiWayBroadcastInnerHashJoin > -- > > Key: SPARK-3862 > URL: https://issues.apache.org/jira/browse/SPARK-3862 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > It is common to have a single fact table inner join many small dimension > tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin > (or maybe just MultiwayDimensionJoin) operator that optimizes for this > pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646376#comment-14646376 ] Reynold Xin commented on SPARK-3862: I don't think that's extreme at all -- very plausible candidate for 1.6! MultiWayBroadcastInnerHashJoin -- Key: SPARK-3862 URL: https://issues.apache.org/jira/browse/SPARK-3862 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin It is common to have a single fact table inner join many small dimension tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin (or maybe just MultiwayDimensionJoin) operator that optimizes for this pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645915#comment-14645915 ] David Sabater commented on SPARK-3862: -- This may sound too extreme but it will be great to have an option in SparkSQL to broadcast these dimension tables before even actually run the queries, which I think will speed up the actual query execution massively (Other SQL MPP engines are doing that already). It will be a call similar to CACHE but replicating all partitions accross all nodes. MultiWayBroadcastInnerHashJoin -- Key: SPARK-3862 URL: https://issues.apache.org/jira/browse/SPARK-3862 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin It is common to have a single fact table inner join many small dimension tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin (or maybe just MultiwayDimensionJoin) operator that optimizes for this pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392214#comment-14392214 ] Apache Spark commented on SPARK-3862: - User 'chenghao-intel' has created a pull request for this issue: https://github.com/apache/spark/pull/5326 MultiWayBroadcastInnerHashJoin -- Key: SPARK-3862 URL: https://issues.apache.org/jira/browse/SPARK-3862 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin It is common to have a single fact table inner join many small dimension tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin (or maybe just MultiwayDimensionJoin) operator that optimizes for this pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3862) MultiWayBroadcastInnerHashJoin
[ https://issues.apache.org/jira/browse/SPARK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187759#comment-14187759 ] Apache Spark commented on SPARK-3862: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/2985 MultiWayBroadcastInnerHashJoin -- Key: SPARK-3862 URL: https://issues.apache.org/jira/browse/SPARK-3862 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin It is common to have a single fact table inner join many small dimension tables. We can exploit this fact and create a MultiWayBroadcastInnerHashJoin (or maybe just MultiwayDimensionJoin) operator that optimizes for this pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org