Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/767#discussion_r104252291
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
    @@ -1333,8 +1339,43 @@ private void spillFromMemory() {
         mergeAndSpill(bufferedBatches, spillCount);
       }
     
    +  private void mergeRuns(int targetCount) {
    +
    +    // Determine the number of runs to merge. The count should be the
    +    // target count. However, to prevent possible memory overrun, we
    +    // double-check with actual spill batch size and only spill as much
    +    // as fits in the merge memory pool.
    +
    +    int mergeCount = 0;
    +    long mergeSize = 0;
    +    for (SpilledRun batch : spilledRuns) {
    +      long batchSize = batch.getBatchSize();
    +      if (mergeSize + batchSize > mergeMemoryPool) {
    +        break;
    +      }
    +      mergeSize += batchSize;
    +      mergeCount++;
    +      if (mergeCount == targetCount) {
    +        break;
    +      }
    +    }
    +
    +    // Must always spill at least 2, even if this creates an over-size
    +    // spill file. But, if this is a final consolidation, we may have only
    +    // a single batch.
    +
    +    mergeCount = Math.max(mergeCount, 2);
    +    mergeCount = Math.min(mergeCount, spilledRuns.size());
    +
    +    // Do the actual spill.
    +
    +    mergeAndSpill(spilledRuns, mergeCount);
    --- End diff --
    
    Just a comment: So we always merge the _FIRST_ mergeCount runs. So if (one 
of) the first run has some crazy "oversized" batch, we'd repeatedly merge a 
small number of runs, as that bad batch may be preserved on.
    Alternatively - select the runs with the smaller "max batch"es, hence 
getting more runs to merge.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to