[jira] [Updated] (SPARK-7097) Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold

2015-05-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7097:
-
Fix Version/s: (was: 1.4.0)

 Partitioned tables should only consider referred partitions in query during 
 size estimation for checking against autoBroadcastJoinThreshold
 ---

 Key: SPARK-7097
 URL: https://issues.apache.org/jira/browse/SPARK-7097
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
Reporter: Yash Datta

 Currently when deciding about whether to create HashJoin or ShuffleHashJoin, 
 the size estimation of partitioned tables involved considers the size of 
 entire table. This results in many query plans using shuffle hash joins , 
 where infact only a small number of partitions may be being referred by the 
 actual query (due to additional filters), and hence these could be run using 
 BroadCastHash join.
 The query plan should consider the size of only the referred partitions in 
 such cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7097) Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold

2015-04-27 Thread Yash Datta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yash Datta updated SPARK-7097:
--
Description: 
Currently when deciding about whether to create HashJoin or ShuffleHashJoin, 
the size estimation of partitioned tables involved considers the size of entire 
table. This results in many query plans using shuffle hash joins , where infact 
only a small number of partitions may be being referred by the actual query 
(due to additional filters), and hence these could be run using BroadCastHash 
join.

The query plan should consider the size of only the referred partitions in such 
cases

  was:
Currently when deciding about whether to create HashJoin or ShuffleHashJoin, 
the size estimation of partitioned tables involved considers the size of entire 
table. This results in many query plans using shuffle hash joins , where infact 
only a small number of partitions may be being referred by the actual query 
(due to additional filters), and hence these could be run using Map side hash 
join.

The query plan should consider the size of only the referred partitions in such 
cases


 Partitioned tables should only consider referred partitions in query during 
 size estimation for checking against autoBroadcastJoinThreshold
 ---

 Key: SPARK-7097
 URL: https://issues.apache.org/jira/browse/SPARK-7097
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
Reporter: Yash Datta
 Fix For: 1.4.0


 Currently when deciding about whether to create HashJoin or ShuffleHashJoin, 
 the size estimation of partitioned tables involved considers the size of 
 entire table. This results in many query plans using shuffle hash joins , 
 where infact only a small number of partitions may be being referred by the 
 actual query (due to additional filters), and hence these could be run using 
 BroadCastHash join.
 The query plan should consider the size of only the referred partitions in 
 such cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org