Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r183264996 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) { */ private int netRowWidth; private int netRowWidthCap50; + + /** + * actual row size if input is not empty. Otherwise, standard size. + */ + private int rowAllocSize; --- End diff -- I wonder if this all would be clearer if we handed it at size estimation time. If the row count is 0, set up everything using the standard sizes. (Note: the whole reason this class exists is that the standard sizes turned out to be *very* poor estimators of actual size.) So, if we have no data, guess the same size as `AllocationHelper`, else use real sizes. And, again the question: under what situation do we want to use the sizer if we don't actually have any data? For the first batch, if no data, just throw away the empty batch and don't size it. Turn around and get another until we receive a non-empty batch. If we've already received at least one non-empty batch, then we receive an empty batch, we should just retain the estimates from the non-empty batch since they will be much better than just making up numbers.
---