Taewoo Kim created ASTERIXDB-1628: ------------------------------------- Summary: The number of partitions in External Hash-Groupby is calculated improperly for smaller data size. Key: ASTERIXDB-1628 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1628 Project: Apache AsterixDB Issue Type: Bug Reporter: Taewoo Kim Assignee: Taewoo Kim
If the number of frames required for a data (e.g., external file), say A, is slightly larger than the number of available frames (= memory budget), say B, then the number of partitions may be calculated as 1 and it will cause the infinite cycles during the merge phase. If the number of partition is 1, the current code assumes that there is no spilling due to the out of memory budget and the output of the build phase is directly generated as the final output. But, if A > B, then a spill would happen and once a partition is spilled to the disk, it can't be generated as the final output. So, the merge process goes to the next round that just creates only one partition again and tries to generate some as final output. But, it can't. Thus, an infinite cycle begins. The resolution is that if A > B, we should not set the number of partition as one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)