[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

ppadma Wed, 31 Jan 2018 11:12:54 -0800

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r165156589
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -215,6 +206,7 @@ public BatchHolder() {
               MaterializedField outputField = materializedValueFields[i];
               // Create a type-specific ValueVector for this value
               vector = TypeHelper.getNewVector(outputField, allocator);
    +          int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
    --- End diff --
    
    @ilooner That is the point. If we know the exact value, why do we need 
RecordBatchSizer ? we should use RecordBatchSizer when we need to get sizing 
information for a batch (in most cases, incoming batch). In this case, you are 
allocating memory for value vectors for the batch you are building. For fixed 
width columns, you can get the column width size for each type you are 
allocating memory for using TypeHelper.getSize. For variable width columns, 
TypeHelper.getSize assumes it is 50 bytes.  If you want to adjust memory you 
are allocating for variable width columns for outgoing batch based on incoming 
batch, that's when you use RecordBatchSizer on actual incoming batch to figure 
out the average size of that column.  You can also use RecordBatchSizer on 
incoming batch if you want to figure out how many values you want to allocate 
memory for in the outgoing batch. Note that, with your change, for just created 
value vectors, variable width columns will return estSize of 1, which is n
 ot what you want.

---

[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

Reply via email to