Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1228#discussion_r183264996
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java
---
@@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
*/
private int netRowWidth;
private int netRowWidthCap50;
+
+ /**
+ * actual row size if input is not empty. Otherwise, standard size.
+ */
+ private int rowAllocSize;
--- End diff --
I wonder if this all would be clearer if we handed it at size estimation
time. If the row count is 0, set up everything using the standard sizes. (Note:
the whole reason this class exists is that the standard sizes turned out to be
*very* poor estimators of actual size.)
So, if we have no data, guess the same size as `AllocationHelper`, else use
real sizes.
And, again the question: under what situation do we want to use the sizer
if we don't actually have any data? For the first batch, if no data, just throw
away the empty batch and don't size it. Turn around and get another until we
receive a non-empty batch.
If we've already received at least one non-empty batch, then we receive an
empty batch, we should just retain the estimates from the non-empty batch since
they will be much better than just making up numbers.
---