Taewoo Kim created ASTERIXDB-1628:
-------------------------------------

             Summary: The number of partitions in External Hash-Groupby is 
calculated improperly for smaller data size.
                 Key: ASTERIXDB-1628
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1628
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


If the number of frames required for a data (e.g., external file), say A,  is 
slightly larger than the number of available frames (= memory budget), say B, 
then the number of partitions may be calculated as 1 and it will cause the 
infinite cycles during the merge phase.

If the number of partition is 1, the current code assumes that there is no 
spilling due to the out of memory budget and the output of the build phase is 
directly generated as the final output. 

But, if A > B, then a spill would happen and once a partition is spilled to the 
disk, it can't be generated as the final output. So, the merge process goes to 
the next round that just creates only one partition again and tries to generate 
some as final output. But, it can't. Thus, an infinite cycle begins.

The resolution is that if A > B, we should not set the number of partition as 
one.   




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to