[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203463#comment-17203463
 ] 

Shiva Jahangiri commented on ASTERIXDB-2784:
--------------------------------------------

The fix for this issue is in my next patch for hybrid hash join where I define 
NoGrow-NoSteal and Grow-Steal policies. Feel free to assign it to me.

> Join memory requirement for large objects
> -----------------------------------------
>
>                 Key: ASTERIXDB-2784
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: COMP - Compiler, RT - Runtime
>            Reporter: Chen Luo
>            Priority: Major
>
> Currently the compiler assumes the minimum number of join frames is 5 [1]. 
> However, this does not guarantee a join will always succeed in case of large 
> objects. The actual join memory requirement is actually MAX(5, #partitions * 
> #large object size). The reason is that in the spill policy [2], we only 
> spill a partition if it hasn't been spilled before. As a result, when we are 
> writing to an empty partition, it is possible that each of other partitions 
> has one large object (which could be larger than the frame size) but no 
> partition can be spilled. Thus, the join memory requirement becomes 
> #partitions * #large object size in this case.
> [1] 
> [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).]
> [2] 
> https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to