Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1101#discussion_r164623527
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
---
@@ -232,9 +251,8 @@ else if (width > 0) {
}
}
- public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE;
// 16 MiB
-
private List<ColumnSize> columnSizes = new ArrayList<>();
+ private Map<String, ColumnSize> columnSizeMap =
CaseInsensitiveMap.newHashMap();
--- End diff --
Drill is case insensitive internally. The case insensitive map is correct.
Thanks for catching this @ilooner!
Unfortunately, record batches have no name space: they are just a
collection of vectors. So, we could end up with columns called both "c" and
"C". This situation will case the column size map to end up with one entry for
both columns, with the last writer winning.
The best solution would be to enforce name space uniqueness when creating
vectors. The new "result set loader" does this, but I suspect other readers
might not -- depending on the particular way that they create their vectors.
Still, creating names that differ only in case is a bug and any code doing that
should be fixed.
---