[ https://issues.apache.org/jira/browse/ASTERIXDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203463#comment-17203463 ]
Shiva Jahangiri commented on ASTERIXDB-2784: -------------------------------------------- The fix for this issue is in my next patch for hybrid hash join where I define NoGrow-NoSteal and Grow-Steal policies. Feel free to assign it to me. > Join memory requirement for large objects > ----------------------------------------- > > Key: ASTERIXDB-2784 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler, RT - Runtime > Reporter: Chen Luo > Priority: Major > > Currently the compiler assumes the minimum number of join frames is 5 [1]. > However, this does not guarantee a join will always succeed in case of large > objects. The actual join memory requirement is actually MAX(5, #partitions * > #large object size). The reason is that in the spill policy [2], we only > spill a partition if it hasn't been spilled before. As a result, when we are > writing to an empty partition, it is possible that each of other partitions > has one large object (which could be larger than the frame size) but no > partition can be spilled. Thus, the join memory requirement becomes > #partitions * #large object size in this case. > [1] > [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).] > [2] > https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55 -- This message was sent by Atlassian Jira (v8.3.4#803005)