[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

paul-rogers Mon, 29 Jan 2018 18:10:44 -0800

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r164623527
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
    @@ -232,9 +251,8 @@ else if (width > 0) {
         }
       }
     
    -  public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
    -
       private List<ColumnSize> columnSizes = new ArrayList<>();
    +  private Map<String, ColumnSize> columnSizeMap = 
CaseInsensitiveMap.newHashMap();
    --- End diff --
    
    Drill is case insensitive internally. The case insensitive map is correct. 
Thanks for catching this @ilooner!
    
    Unfortunately, record batches have no name space: they are just a 
collection of vectors. So, we could end up with columns called both "c" and 
"C". This situation will case the column size map to end up with one entry for 
both columns, with the last writer winning.
    
    The best solution would be to enforce name space uniqueness when creating 
vectors. The new "result set loader" does this, but I suspect other readers 
might not -- depending on the particular way that they create their vectors. 
Still, creating names that differ only in case is a bug and any code doing that 
should be fixed.

---

[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

Reply via email to