[jira] [Commented] (DRILL-5025) ExternalSortBatch provides weak control over spill file size
[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945862#comment-15945862 ] Paul Rogers commented on DRILL-5025: Primarily a development issue; hard to test at the QA level. > ExternalSortBatch provides weak control over spill file size > > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5025) ExternalSortBatch provides weak control over spill file size
[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655275#comment-15655275 ] Paul Rogers commented on DRILL-5025: Cancelling for now; spill file size is determined by the spill/respill strategy; is best discussed in that context. > ExternalSortBatch provides weak control over spill file size > > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)