Github user ppadma commented on a diff in the pull request:
https://github.com/apache/drill/pull/1101#discussion_r165156589
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
---
@@ -215,6 +206,7 @@ public BatchHolder() {
MaterializedField outputField = materializedValueFields[i];
// Create a type-specific ValueVector for this value
vector = TypeHelper.getNewVector(outputField, allocator);
+ int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --
@ilooner That is the point. If we know the exact value, why do we need
RecordBatchSizer ? we should use RecordBatchSizer when we need to get sizing
information for a batch (in most cases, incoming batch). In this case, you are
allocating memory for value vectors for the batch you are building. For fixed
width columns, you can get the column width size for each type you are
allocating memory for using TypeHelper.getSize. For variable width columns,
TypeHelper.getSize assumes it is 50 bytes. If you want to adjust memory you
are allocating for variable width columns for outgoing batch based on incoming
batch, that's when you use RecordBatchSizer on actual incoming batch to figure
out the average size of that column. You can also use RecordBatchSizer on
incoming batch if you want to figure out how many values you want to allocate
memory for in the outgoing batch. Note that, with your change, for just created
value vectors, variable width columns will return estSize of 1, which is n
ot what you want.
---