[GitHub] drill pull request #1228: DRILL-6307: Handle empty batches in record batch s...

paul-rogers Sun, 22 Apr 2018 19:30:10 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r183264996
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
        */
       private int netRowWidth;
       private int netRowWidthCap50;
    +
    +  /**
    +   * actual row size if input is not empty. Otherwise, standard size.
    +   */
    +  private int rowAllocSize;
    --- End diff --
    
    I wonder if this all would be clearer if we handed it at size estimation 
time. If the row count is 0, set up everything using the standard sizes. (Note: 
the whole reason this class exists is that the standard sizes turned out to be 
*very* poor estimators of actual size.)
    
    So, if we have no data, guess the same size as `AllocationHelper`, else use 
real sizes.
    
    And, again the question: under what situation do we want to use the sizer 
if we don't actually have any data? For the first batch, if no data, just throw 
away the empty batch and don't size it. Turn around and get another until we 
receive a non-empty batch.
    
    If we've already received at least one non-empty batch, then we receive an 
empty batch, we should just retain the estimates from the non-empty batch since 
they will be much better than just making up numbers.

---

[GitHub] drill pull request #1228: DRILL-6307: Handle empty batches in record batch s...

Reply via email to