[ https://issues.apache.org/jira/browse/DRILL-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843523#comment-15843523 ]
Rahul Challapalli commented on DRILL-5014: ------------------------------------------ The fact that the existing sort forces a spill when the memory usage reaches 95% of the allocation covers up this issue. Otherwise we would have seen an OOM error on large data sets where a lot of spilling and merging happens. Maybe a performance test would expose this? > ExternalSortBatch cache size, spill count differs from config setting > --------------------------------------------------------------------- > > Key: DRILL-5014 > URL: https://issues.apache.org/jira/browse/DRILL-5014 > Project: Apache Drill > Issue Type: Sub-task > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > The ExternalSortBatch (ESB) operator sorts data while spilling to disk to > remain within a defined memory footprint. Spilling occurs based on a number > of factors. Among those are two config parameters: > * {{drill.exec.sort.external.spill.group.size}}: The number of batches to > spill per spill event. > * {{drill.exec.sort.external.spill.threshold}}: The number of batches to > accumulate in memory before starting a spill event. > The expected behavior would be: > * After the accumulated batches exceeds the threshold, and > * More than "batch size" new batches have arrived since the last spill, > * Spill half the accumulated records. > That is if the threshold is 200, and the size is 150, we should accumulate > 200 batches, then spill 150 of them (leaving 50) and repeat. > The actual behavior is: > * When the accumulated records exceeds the threshold and, > * More than "batch size" new batches have arrived since the last spill, > * Spill half the accumulated records. > The above can leave more batches in memory than desired, and spill more than > desired. > The actual behavior for the (threshold=200, size=150) case is: > {code} > Before spilling, buffered batch count: 201 > After spilling, buffered batch count: 101 > Before spilling, buffered batch count: 251 > After spilling, buffered batch count: 126 > Before spilling, buffered batch count: 276 > After spilling, buffered batch count: 138 > Before spilling, buffered batch count: 288 > After spilling, buffered batch count: 144 > Before spilling, buffered batch count: 294 > After spilling, buffered batch count: 147 > Before spilling, buffered batch count: 297 > After spilling, buffered batch count: 149 > Before spilling, buffered batch count: 299 > After spilling, buffered batch count: 150 > Before spilling, buffered batch count: 300 > After spilling, buffered batch count: 150 > Before spilling, buffered batch count: 300 > {code} > In short, the actual number of batches retained in memory is twice the spill > size, **NOT** the number set in the threshold. As a result, the ESB operator > will use more memory than expected. > The work-around is to set a batch size that is half the threshold so that the > batch size (used in spill decisions) matches the actual spill count (as > implemented by the code.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)