Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r184200281
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -277,18 +286,29 @@ public boolean isRepeatedList() {
         /**
          * This is the average per entry width, used for vector allocation.
          */
    -    public int getEntryWidth() {
    +    private int getEntryWidthForAlloc() {
           int width = 0;
           if (isVariableWidth) {
    -        width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
    +        width = getAllocSizePerEntry() - OFFSET_VECTOR_WIDTH;
     
             // Subtract out the bits (is-set) vector width
    -        if (metadata.getDataMode() == DataMode.OPTIONAL) {
    +        if (isOptional) {
               width -= BIT_VECTOR_WIDTH;
             }
    +
    +        if (isRepeated && getValueCount() == 0) {
    +          return (safeDivide(width, STD_REPETITION_FACTOR));
    +        }
           }
     
    -      return (safeDivide(width, cardinality));
    +      return (safeDivide(width, getEntryCardinalityForAlloc()));
    +    }
    +
    +    /**
    +     * This is the average per entry cardinality, used for vector 
allocation.
    +     */
    +    private float getEntryCardinalityForAlloc() {
    +      return getCardinality() == 0 ? (isRepeated ? STD_REPETITION_FACTOR : 
1) :getCardinality();
    --- End diff --
    
    This is for joins. We allocate vectors based on first batch sizing 
information and if that first batch is empty, then, we are allocating vectors 
with zero capacity. When we read the next batch with data, we will end up going 
through realloc loop as we write values. For ex., for outer left join, if right 
side batch is empty, we still have to include the right side columns as null in 
outgoing batch. With the new lateral join operator, if the input has an empty 
array as the first record in the unnest column, then also we see the problem. 


---

Reply via email to