[
https://issues.apache.org/jira/browse/SPARK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yash Datta updated SPARK-7097:
--
Description:
Currently when deciding about whether to create HashJoin or ShuffleHashJoin,
the size estimation of partitioned tables involved considers the size of entire
table. This results in many query plans using shuffle hash joins , where infact
only a small number of partitions may be being referred by the actual query
(due to additional filters), and hence these could be run using BroadCastHash
join.
The query plan should consider the size of only the referred partitions in such
cases
was:
Currently when deciding about whether to create HashJoin or ShuffleHashJoin,
the size estimation of partitioned tables involved considers the size of entire
table. This results in many query plans using shuffle hash joins , where infact
only a small number of partitions may be being referred by the actual query
(due to additional filters), and hence these could be run using Map side hash
join.
The query plan should consider the size of only the referred partitions in such
cases
Partitioned tables should only consider referred partitions in query during
size estimation for checking against autoBroadcastJoinThreshold
---
Key: SPARK-7097
URL: https://issues.apache.org/jira/browse/SPARK-7097
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
Reporter: Yash Datta
Fix For: 1.4.0
Currently when deciding about whether to create HashJoin or ShuffleHashJoin,
the size estimation of partitioned tables involved considers the size of
entire table. This results in many query plans using shuffle hash joins ,
where infact only a small number of partitions may be being referred by the
actual query (due to additional filters), and hence these could be run using
BroadCastHash join.
The query plan should consider the size of only the referred partitions in
such cases
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org