One more thought: > > 3) Assuming that you go with the average batch size calculation approach,
The average batch size approach is a quick and dirty approach for non-leaf operators that can observe an incoming batch to estimate row width. Because Drill batches are large, the law of large numbers means that the average of a large input batch is likely to be a good estimator for the average size of a large output batch. Note that this works only because non-leaf operators have an input batch to sample. Leaf operators (readers) do not have this luxury. Hence the result set loader uses the actual accumulated size for the current batch. Also note that the average row method, while handy, is not optimal. It will, in general, result in greater internal fragmentation than the result set loader. Why? The result set loader packs vectors right up to the point where the largest would overflow. The average row method works at the aggregate level and will likely result in wasted space (internal fragmentation) in the largest vector. Said another way, with the average row size method, we can usually pack in a few more rows before the batch actually fills, and so we end up with batches with lower "density" than the optimal. This is important when the consuming operator is a buffering one such as sort. The key reason Padma is using the quick & dirty average row size method is not that it is ideal (it is not), but rather that it is, in fact, quick. We do want to move to the result set loader over time so we get improved memory utilization. And, it is the only way to control row size in readers such as CSV or JSON in which we have no size information until we read the data. - PaulĀ