[ https://issues.apache.org/jira/browse/DRILL-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649126#comment-15649126 ]
Paul Rogers commented on DRILL-5014: ------------------------------------ Proposed solution. Set an overall ESB memory limit which is the lesser of: * The memory given by the allocator for the ESB. * The memory specified in the {{ExternalSort}} operator specification. * A new config option {{drill.exec.sot.external.memory-limit}} (mostly for testing and special cases.) As in the current code, subtract off an allowance for the memory-merge operator and the "copier." Provide a new config option to define the high-water mark for buffering: {{drill.exec.sort.external.max-buffer-ratio}} which defaults to 0.95 (the value hardcoded today.) Spilling will occur when the next batch would cause buffer usage to exceed memory budget * reserve-ratio. Once spilling occurs, spill until memory use drops below the low-water mark: {{drill.exec.sort.external.spill.to-buffer-ratio}}. For testing purposes (and special cases), repurpose {{drill.exec.sort.external.spill.threshold}} as an optional lower limit, measured in batches, of the high water mark. Similarly, define {{drill.exec.sort.external.spill.group.size}} to be the number of batches to spill (low water mark is {{threshold - group.size}}.) This new meaning should be compatible with the earlier meaning. In normal operation, the user sets nothing: the Foreman sets the memory budget, preconfigured high- and low-water marks control spilling. > ExternalSortBatch cache size, spill count differs from config setting > --------------------------------------------------------------------- > > Key: DRILL-5014 > URL: https://issues.apache.org/jira/browse/DRILL-5014 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Priority: Minor > > The ExternalSortBatch (ESB) operator sorts data while spilling to disk to > remain within a defined memory footprint. Spilling occurs based on a number > of factors. Among those are two config parameters: > * {{drill.exec.sort.external.spill.group.size}}: The number of batches to > spill per spill event. > * {{drill.exec.sort.external.spill.threshold}}: The number of batches to > accumulate in memory before starting a spill event. > The expected behavior would be: > * After the accumulated batches exceeds the threshold, and > * More than "batch size" new batches have arrived since the last spill, > * Spill half the accumulated records. > That is if the threshold is 200, and the size is 150, we should accumulate > 200 batches, then spill 150 of them (leaving 50) and repeat. > The actual behavior is: > * When the accumulated records exceeds the threshold and, > * More than "batch size" new batches have arrived since the last spill, > * Spill half the accumulated records. > The above can leave more batches in memory than desired, and spill more than > desired. > The actual behavior for the (threshold=200, size=150) case is: > {code} > Before spilling, buffered batch count: 201 > After spilling, buffered batch count: 101 > Before spilling, buffered batch count: 251 > After spilling, buffered batch count: 126 > Before spilling, buffered batch count: 276 > After spilling, buffered batch count: 138 > Before spilling, buffered batch count: 288 > After spilling, buffered batch count: 144 > Before spilling, buffered batch count: 294 > After spilling, buffered batch count: 147 > Before spilling, buffered batch count: 297 > After spilling, buffered batch count: 149 > Before spilling, buffered batch count: 299 > After spilling, buffered batch count: 150 > Before spilling, buffered batch count: 300 > After spilling, buffered batch count: 150 > Before spilling, buffered batch count: 300 > {code} > In short, the actual number of batches retained in memory is twice the spill > size, **NOT** the number set in the threshold. As a result, the ESB operator > will use more memory than expected. > The work-around is to set a batch size that is half the threshold so that the > batch size (used in spill decisions) matches the actual spill count (as > implemented by the code.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)