Github user StevenMPhillips commented on a diff in the pull request:

    https://github.com/apache/drill/pull/317#discussion_r48900539
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
    @@ -588,36 +571,29 @@ public BatchGroup 
mergeAndSpill(LinkedList<BatchGroup> batchGroups) throws Schem
         } finally {
           hyperBatch.clear();
         }
    -    long bufSize = getBufferSize(c1);
    -    totalSizeInMemory += bufSize;
    -    logger.debug("mergeAndSpill: final total size in memory = {}", 
totalSizeInMemory);
    +    logger.debug("mergeAndSpill: final total size in memory = {}", 
oAllocator.getAllocatedMemory());
         logger.info("Completed spilling to {}", outputFile);
         return newGroup;
       }
     
    -  private long getBufferSize(VectorAccessible batch) {
    -    long size = 0;
    -    for (VectorWrapper<?> w : batch) {
    -      DrillBuf[] bufs = w.getValueVector().getBuffers(false);
    -      for (DrillBuf buf : bufs) {
    -        size += buf.getPossibleMemoryConsumed();
    -      }
    -    }
    -    return size;
    -  }
    -
       private SelectionVector2 newSV2() throws OutOfMemoryException, 
InterruptedException {
    -    SelectionVector2 sv2 = new SelectionVector2(oContext.getAllocator());
    +    SelectionVector2 sv2 = new SelectionVector2(oAllocator);
         if (!sv2.allocateNewSafe(incoming.getRecordCount())) {
           try {
    -        // Not being able to allocate sv2 means this operator's allocator 
reached it's maximum capacity.
    -        // Spilling this.batchGroups won't help here as they are owned by 
upstream operator,
    -        // but spilling spilledBatchGroups may free enough memory
    -        final BatchGroup merged = mergeAndSpill(spilledBatchGroups);
    +        final BatchGroup merged = mergeAndSpill(batchGroups);
             if (merged != null) {
    -          spilledBatchGroups.addFirst(merged);
    +          spilledBatchGroups.add(merged);
    --- End diff --
    
    The ordering in the queue only matters for determining which BatchGroups 
will be merged the next time mergeAndSpill is called. Since mergeAndSpill takes 
batches off the the end of the queue, we want to put the outcome at the front 
of the queue, since the outcome BatchGroup has a much larger spill size than 
the ones at the end of the queue, and is thus more expensive to merge.
    
    So I think you should revert this change, and leave it the way it is. It 
might be a good idea to take more extensive look at the spilling/merging 
strategy, but that can be a different task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to