[ https://issues.apache.org/jira/browse/DRILL-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857604#comment-15857604 ]
ASF GitHub Bot commented on DRILL-5080: --------------------------------------- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/717 Some comment got lost in the force-push. One was related to the output batch size, suggesting we cap it at 16 MB. The reason is that value vectors about 16 MB cause memory fragmentation. A later fix will limit an output batch to either 64K rows (the size of an sv2) or so that the longest vector is smaller than 16 MB. The most recent commit added per-column size information so that we can enforce this limit. For example, we can have 64K rows with columns of size 256 bytes within a 16 MB vector. There is no reason not to allow 64K rows even for rows with four of the 256 columns. Total batch size would be 64 MB, but no single vector would be above 16 MB. That fix will be offered, along with tests and enabling the managed sort by default, in a subsequent PR. > Create a memory-managed version of the External Sort operator > ------------------------------------------------------------- > > Key: DRILL-5080 > URL: https://issues.apache.org/jira/browse/DRILL-5080 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.10.0 > > Attachments: ManagedExternalSortDesign.pdf > > > We propose to create a "managed" version of the external sort operator that > works to a clearly-defined memory limit. Attached is a design specification > for the work. > The project will include fixing a number of bugs related to the external > sort, include as sub-tasks of this umbrella task. -- This message was sent by Atlassian JIRA (v6.3.15#6346)