Paul Rogers created DRILL-5025:
----------------------------------

             Summary: ExternalSortBatch provides weak control over spill file 
size
                 Key: DRILL-5025
                 URL: https://issues.apache.org/jira/browse/DRILL-5025
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Priority: Minor


The ExternalSortBatch (ESB) operator sorts records while spilling to disk to 
control memory use. The size of the spill file is not easy to control. It is a 
function of the accumulated batches size (half of the accumulated total), which 
is determined by either the memory budget or the 
{{drill.exec.sort.external.group.size}} parameter. (But, even with the 
parameter, the actual file size is still half the accumulated batches.)

The proposed solution is to provide an explicit parameter that sets the maximum 
spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB needs to 
spill more than this amount of data, ESB should split the spill into multiple 
files.

The spill.size should be in bytes (or MB). (A size in records makes the file 
size data-dependent, which would not be helpful.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to