[jira] [Updated] (SPARK-2393) Simple cost estimation and auto selection of broadcast join

2014-07-08 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2393:


Assignee: Zongheng Yang  (was: Michael Armbrust)

 Simple cost estimation and auto selection of broadcast join
 ---

 Key: SPARK-2393
 URL: https://issues.apache.org/jira/browse/SPARK-2393
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Zongheng Yang
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2393) Simple cost estimation and auto selection of broadcast join

2014-07-08 Thread Zongheng Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongheng Yang updated SPARK-2393:
-

Description: 
Spark SQL should support the common optimization known as cost estimations. For 
example, each logical operator should be able to estimate its cardinality, 
based on the estimates from its children. 

As a first step, the framework to support doing so should be added, which might 
include the interface for the aforementioned cardinality estimation, and/or 
some other metrics.

As the first proof of concept usage of this optimization, a simple optimization 
strategy for certain equi-joins should be added: namely, if one side of a 
qualifying join has a small estimated physical size (smaller than some 
threshold), the planner should use a broadcast join physical plan to execute 
the join, broadcasting the small side and streaming through the bigger side. 
Similar concept exists in Hive and is explained 
[here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].

 Simple cost estimation and auto selection of broadcast join
 ---

 Key: SPARK-2393
 URL: https://issues.apache.org/jira/browse/SPARK-2393
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Zongheng Yang
Priority: Critical

 Spark SQL should support the common optimization known as cost estimations. 
 For example, each logical operator should be able to estimate its 
 cardinality, based on the estimates from its children. 
 As a first step, the framework to support doing so should be added, which 
 might include the interface for the aforementioned cardinality estimation, 
 and/or some other metrics.
 As the first proof of concept usage of this optimization, a simple 
 optimization strategy for certain equi-joins should be added: namely, if one 
 side of a qualifying join has a small estimated physical size (smaller than 
 some threshold), the planner should use a broadcast join physical plan to 
 execute the join, broadcasting the small side and streaming through the 
 bigger side. Similar concept exists in Hive and is explained 
 [here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2393) Simple cost estimation and auto selection of broadcast join

2014-07-08 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2393:
---

Issue Type: New Feature  (was: Bug)

 Simple cost estimation and auto selection of broadcast join
 ---

 Key: SPARK-2393
 URL: https://issues.apache.org/jira/browse/SPARK-2393
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Assignee: Zongheng Yang
Priority: Critical

 Spark SQL should support the common optimization known as cost estimations. 
 For example, each logical operator should be able to estimate its 
 cardinality, based on the estimates from its children. 
 As a first step, the framework to support doing so should be added, which 
 might include the interface for the aforementioned cardinality estimation, 
 and/or some other metrics.
 As the first proof of concept usage of this optimization, a simple 
 optimization strategy for certain equi-joins should be added: namely, if one 
 side of a qualifying join has a small estimated physical size (smaller than 
 some threshold), the planner should use a broadcast join physical plan to 
 execute the join, broadcasting the small side and streaming through the 
 bigger side. Similar concept exists in Hive and is explained 
 [here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].



--
This message was sent by Atlassian JIRA
(v6.2#6252)