[ 
https://issues.apache.org/jira/browse/SPARK-16475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16475:
--------------------------------
    Description: 
Broadcast hint is a way for users to manually annotate a query and suggest to 
the query optimizer the join method. It is very useful when the query optimizer 
cannot make optimal decision with respect to join methods due to 
conservativeness or the lack of proper statistics.

The DataFrame API has broadcast hint since Spark 1.5. However, we do not have 
an equivalent functionality in SQL queries. We propose adding Hive-style 
broadcast hint to Spark SQL.

For more information, please see the attached document. One note about the doc: 
in addition to supporting "MAPJOIN", we should also support "BROADCASTJOIN" and 
"BROADCAST" in the comment, e.g. the following should be accepted:

{code}
SELECT /*+ MAPJOIN(b) */ ...

SELECT /*+ BROADCASTJOIN(b) */ ...

SELECT /*+ BROADCAST(b) */ ...
{code}



  was:
Broadcast hint is a way for users to manually annotate a query and suggest to 
the query optimizer the join method. It is very useful when the query optimizer 
cannot make optimal decision with respect to join methods due to 
conservativeness or the lack of proper statistics.

The DataFrame API has broadcast hint since Spark 1.5. However, we do not have 
an equivalent functionality in SQL queries. We propose adding Hive-style 
broadcast hint to Spark SQL.

For more information, please see the attached document. One note about the doc: 
in addition to supporting "MAPJOIN", we should also support "BROADCASTJOIN".



> Broadcast Hint for SQL Queries
> ------------------------------
>
>                 Key: SPARK-16475
>                 URL: https://issues.apache.org/jira/browse/SPARK-16475
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Reynold Xin
>         Attachments: BroadcastHintinSparkSQL.pdf
>
>
> Broadcast hint is a way for users to manually annotate a query and suggest to 
> the query optimizer the join method. It is very useful when the query 
> optimizer cannot make optimal decision with respect to join methods due to 
> conservativeness or the lack of proper statistics.
> The DataFrame API has broadcast hint since Spark 1.5. However, we do not have 
> an equivalent functionality in SQL queries. We propose adding Hive-style 
> broadcast hint to Spark SQL.
> For more information, please see the attached document. One note about the 
> doc: in addition to supporting "MAPJOIN", we should also support 
> "BROADCASTJOIN" and "BROADCAST" in the comment, e.g. the following should be 
> accepted:
> {code}
> SELECT /*+ MAPJOIN(b) */ ...
> SELECT /*+ BROADCASTJOIN(b) */ ...
> SELECT /*+ BROADCAST(b) */ ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to