[ https://issues.apache.org/jira/browse/DRILL-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649214#comment-15649214 ]
Paul Rogers commented on DRILL-5011: ------------------------------------ Proposed solution: The best solution is to revise the "copier" to track actual memory use. But, copier is generated code so doing so is a big project. Short-term alternative: use a better record width estimate. ESB already maintains a running cumulative record count. Add a new count that is the in-memory count (total record count - spilled record count). The ESB allocator already maintains a counter of the total memory allocated to the ESB (which, by definition, must be for the in-memory batches.) Compute a better record width estimate as: {{record width estimate = allocated memory / in-memory record count}} In a simple experiment: {code} Original record width estimate: 324 Revised record width estimate: 114 {code} > External Sort Batch memory use depends on record width > ------------------------------------------------------ > > Key: DRILL-5011 > URL: https://issues.apache.org/jira/browse/DRILL-5011 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Priority: Minor > > The ExternalSortBatch operator uses spill-to-disk to keep memory needs within > a defined limit. However, the "copier" (really, the merge operation) can use > an amount of memory determined not by the operator configuration, but by the > width of each record. > The copier memory limit appears to be set by the COPIER_BATCH_MEM_LIMIT value. > However, the actual memory use is determined by the number of records that > the copier is asked to copy. That record comes from an estimate of row width > based on the type of each column. Note that the row width *is not* based on > the actual data in each row. Varchar fields, for example, are assumed to be > 40 characters wide. If the sorter is asked to sort records with Varchar > fields of, say, 1000 characters, then the row width estimate will be a poor > estimator of actual width. > Memory use is based on a > {code} > target record count = memory limit / estimate row width > {code} > Actual memory use is: > {code} > memory use = target row count * actual row width > {code} > Which is > {code} > memory use = memory limit * actual row width / estimate row width > {code} > That is, memory use depends on the ratio of actual to estimated width. If the > estimate is off by 2, then we use twice as much memory as expected. > Not that the memory used for the copier defaults to 20 MB, so even an error > of 4x still means only 80 MB of memory used; small in comparison to the many > GB typically allocated to ESB storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)