[GitHub] drill issue #717: DRILL-5080: Memory-managed version of external sort

paul-rogers Wed, 08 Feb 2017 00:00:38 -0800

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/717
  
    Some comment got lost in the force-push. One was related to the output 
batch size, suggesting we cap it at 16 MB. The reason is that value vectors 
about 16 MB cause memory fragmentation. A later fix will limit an output batch 
to either 64K rows (the size of an sv2) or so that the longest vector is 
smaller than 16 MB. The most recent commit added per-column size information so 
that we can enforce this limit. For example, we can have 64K rows with columns 
of size 256 bytes within a 16 MB vector. There is no reason not to allow 64K 
rows even for rows with four of the 256 columns. Total batch size would be 64 
MB, but no single vector would be above 16 MB.
    
    That fix will be offered, along with tests and enabling the managed sort by 
default, in a subsequent PR.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #717: DRILL-5080: Memory-managed version of external sort

Reply via email to