[ https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-2393: ------------------------------- Issue Type: New Feature (was: Bug) > Simple cost estimation and auto selection of broadcast join > ----------------------------------------------------------- > > Key: SPARK-2393 > URL: https://issues.apache.org/jira/browse/SPARK-2393 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Michael Armbrust > Assignee: Zongheng Yang > Priority: Critical > > Spark SQL should support the common optimization known as cost estimations. > For example, each logical operator should be able to estimate its > cardinality, based on the estimates from its children. > As a first step, the framework to support doing so should be added, which > might include the interface for the aforementioned cardinality estimation, > and/or some other metrics. > As the first proof of concept usage of this optimization, a simple > optimization strategy for certain equi-joins should be added: namely, if one > side of a qualifying join has a small estimated physical size (smaller than > some threshold), the planner should use a broadcast join physical plan to > execute the join, broadcasting the small side and streaming through the > bigger side. Similar concept exists in Hive and is explained > [here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]. -- This message was sent by Atlassian JIRA (v6.2#6252)