[ 
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhan Zhang updated SPARK-20006:
-------------------------------
    Description: 
Currently both canBroadcast and canBuildLocalHashMap use the same 
configuration: AUTO_BROADCASTJOIN_THRESHOLD. 

But the memory model may be different. For broadcast, currently the hash map is 
always build on heap. For shuffledHashJoin, the hash map may be build on 
heap(longHash), or off heap(other map if off heap is enabled). The same 
configuration makes the configuration hard to tune (how to allocate memory 
onheap/offheap). Propose to use different configuration. Please comments 
whether it is reasonable.

  was:
Currently both canBroadcast and canBuildLocalHashMap use the same 
configuration: AUTO_BROADCASTJOIN_THRESHOLD. 

But the memory model may be different. For broadcast, currently the hash map is 
always build on heap. For shuffledHashJoin, the hash map may be build on 
heap(longHash), or off heap(other map if off heap is enabled). The same 
configuration makes the configuration hard to tune (how to allocate memory 
onheap/offheap). Propose to use different configuration.


> Separate threshold for broadcast and shuffled hash join
> -------------------------------------------------------
>
>                 Key: SPARK-20006
>                 URL: https://issues.apache.org/jira/browse/SPARK-20006
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Zhan Zhang
>            Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same 
> configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map 
> is always build on heap. For shuffledHashJoin, the hash map may be build on 
> heap(longHash), or off heap(other map if off heap is enabled). The same 
> configuration makes the configuration hard to tune (how to allocate memory 
> onheap/offheap). Propose to use different configuration. Please comments 
> whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to