Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/761#discussion_r103333406
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
    @@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
         long origInputBatchSize = estimatedInputBatchSize;
         estimatedInputBatchSize = Math.max(estimatedInputBatchSize, 
actualBatchSize);
     
    +    // The row width may end up as zero if all fields are nulls or some
    +    // other unusual situation. In this case, assume a width of 10 just
    +    // to avoid lots of special case code.
    +
    +    if (estimatedRowWidth == 0) {
    +      estimatedRowWidth = 10;
    --- End diff --
    
    This is a very peculiar case that came up in testing. It seems that we can 
have a row with one column and that one column is always null. Imagine a 
Parquet file that has 1 million Varchars, all of which are null. In every 
batch, the row width will be 0. Since we often divide by the row width, bad 
things happen. So, here, we arbitrarily say that if the row is abnormally 
small, just assume 10 bytes to avoid the need for a bunch of special case 
calcs. (The calcs are already too complex already.)
    
    If there are 1000 columns, all of which are null, we would write 1000 "bit" 
(really byte) vectors, so each row would be 1000 bytes wide. But, in such a 
case, the batch analyzer should have come up with a number other than 0 for the 
row width.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to