[ 
https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2393:
-------------------------------

    Issue Type: New Feature  (was: Bug)

> Simple cost estimation and auto selection of broadcast join
> -----------------------------------------------------------
>
>                 Key: SPARK-2393
>                 URL: https://issues.apache.org/jira/browse/SPARK-2393
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Zongheng Yang
>            Priority: Critical
>
> Spark SQL should support the common optimization known as cost estimations. 
> For example, each logical operator should be able to estimate its 
> cardinality, based on the estimates from its children. 
> As a first step, the framework to support doing so should be added, which 
> might include the interface for the aforementioned cardinality estimation, 
> and/or some other metrics.
> As the first proof of concept usage of this optimization, a simple 
> optimization strategy for certain equi-joins should be added: namely, if one 
> side of a qualifying join has a small estimated physical size (smaller than 
> some threshold), the planner should use a broadcast join physical plan to 
> execute the join, broadcasting the small side and streaming through the 
> bigger side. Similar concept exists in Hive and is explained 
> [here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to