[ 
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931054#comment-15931054
 ] 

Zhan Zhang edited comment on SPARK-20006 at 3/18/17 4:42 AM:
-------------------------------------------------------------

The default ShuffledHashJoin threshold can fallback to the broadcast one. A 
separate configuration does provide us opportunities to optimize the join 
dramatically. It would be great if CBO can automatically find the best 
strategy. But probably I miss something. Currently the CBO does not collect 
right statistics, especially for partitioned table. I have opened a JIRA for 
that issue as well. https://issues.apache.org/jira/browse/SPARK-19890


was (Author: zhzhan):
The default ShuffledHashJoin threshold can fallback to the broadcast one. A 
separate configuration does provide us opportunities to optimize the join 
dramatically. It would be great if CBO can automatically find the best 
strategy. But probably I miss something. Currently the CBO does not collect 
right statistics, especially for partitioned table. 
https://issues.apache.org/jira/browse/SPARK-19890

> Separate threshold for broadcast and shuffled hash join
> -------------------------------------------------------
>
>                 Key: SPARK-20006
>                 URL: https://issues.apache.org/jira/browse/SPARK-20006
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Zhan Zhang
>            Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same 
> configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map 
> is always build on heap. For shuffledHashJoin, the hash map may be build on 
> heap(longHash), or off heap(other map if off heap is enabled). The same 
> configuration makes the configuration hard to tune (how to allocate memory 
> onheap/offheap). Propose to use different configuration. Please comments 
> whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to