Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/317#discussion_r48901098
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
    @@ -588,36 +571,29 @@ public BatchGroup 
mergeAndSpill(LinkedList<BatchGroup> batchGroups) throws Schem
         } finally {
           hyperBatch.clear();
         }
    -    long bufSize = getBufferSize(c1);
    -    totalSizeInMemory += bufSize;
    -    logger.debug("mergeAndSpill: final total size in memory = {}", 
totalSizeInMemory);
    +    logger.debug("mergeAndSpill: final total size in memory = {}", 
oAllocator.getAllocatedMemory());
         logger.info("Completed spilling to {}", outputFile);
         return newGroup;
       }
     
    -  private long getBufferSize(VectorAccessible batch) {
    -    long size = 0;
    -    for (VectorWrapper<?> w : batch) {
    -      DrillBuf[] bufs = w.getValueVector().getBuffers(false);
    -      for (DrillBuf buf : bufs) {
    -        size += buf.getPossibleMemoryConsumed();
    -      }
    -    }
    -    return size;
    -  }
    -
       private SelectionVector2 newSV2() throws OutOfMemoryException, 
InterruptedException {
    -    SelectionVector2 sv2 = new SelectionVector2(oContext.getAllocator());
    +    SelectionVector2 sv2 = new SelectionVector2(oAllocator);
         if (!sv2.allocateNewSafe(incoming.getRecordCount())) {
           try {
    -        // Not being able to allocate sv2 means this operator's allocator 
reached it's maximum capacity.
    -        // Spilling this.batchGroups won't help here as they are owned by 
upstream operator,
    -        // but spilling spilledBatchGroups may free enough memory
    -        final BatchGroup merged = mergeAndSpill(spilledBatchGroups);
    +        final BatchGroup merged = mergeAndSpill(batchGroups);
             if (merged != null) {
    -          spilledBatchGroups.addFirst(merged);
    +          spilledBatchGroups.add(merged);
    --- End diff --
    
    One possible explanation, and @StevenMPhillips could confirm it:
    
    sort tries to prevent `spilledBatchGroups` from becoming too large by 
spilling batches from this group regularly to disk, by putting the result of 
those spills at the beginning of spilledBatchGroups  we avoid spilling the same 
batches over and over again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to