[
https://issues.apache.org/jira/browse/DRILL-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pritesh Maker updated DRILL-6161:
---------------------------------
Reviewer: Paul Rogers
> Allocate memory for outgoing vectors based on sizing calculations
> -----------------------------------------------------------------
>
> Key: DRILL-6161
> URL: https://issues.apache.org/jira/browse/DRILL-6161
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Affects Versions: 1.12.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Critical
> Fix For: 1.13.0
>
>
> Currently, in drill, we allocate memory for outgoing value vectors either for
> max value of 64k entries or start from 4096 and keep doubling as we need more
> memory. Every time we double, we allocate a new vector and do a copy. We also
> zero fill the new half. This has performance penalty. As part of batch sizing
> project, based on incoming batch(es) sizing information, we are limiting
> number of rows in outgoing batch based on memory. Since we know the number of
> rows and the average size of each column in the outgoing batch, we should use
> that information to preallocate memory for the outgoing vectors. This will be
> done as each operator is being changed to adhere to produce configured batch
> sizes.
> Another improvement that can be done is packing the value vectors as dense as
> possible to improve the over all memory utilization. Since we allocate memory
> in powers of 2, once we figure out the number of rows to include in the
> outgoing batch, round it down to closest power of 2 and allocate memory for
> that many rows.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)