[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-03-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/761


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103332364
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -392,22 +448,31 @@ private void configure(DrillConfig config) {
 // Set too large and the ratio between memory and input data sizes 
becomes
 // small. Set too small and disk seek times dominate performance.
 
-spillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
-spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE);
+preferredSpillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
+
+// In low memory, use no more than 1/4 of memory for each spill batch. 
Ensures we
+// can merge.
+
+preferredSpillBatchSize = Math.min(preferredSpillBatchSize, 
memoryLimit / 4);
--- End diff --

In low memory conditions, restrict the spill batch size to 1/4 of memory. 
Why?

* We need to accumulate at least 2 such batches to do a merge. (Now at 1/2 
of memory.)
* We need to create an output batch from the two inputs (3/4 of memory).
* Need overhead for other direct memory uses. (Remaining 1/4 of memory.)

Sadly, memory management in Drill is not very precise: batch sizes can't be 
predicted with any accuracy. Trying to use, say, 1/3 of memory for the spill 
batch would seem more logical. (Two batches into the merge, one out), but the 
allocator issues a fatal error if we guess wrong by even one byte. So, we are 
forced to be conservative.

If we had better control, or a more forgiving allocator, we could make 
different choices.

Also, why try to sort GBs of data in 20 MB? Yet, this is the test case that 
had to be solved and that this particular fix enables.

I'm open to suggestions for better solutions; this is a very tricky area...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103335045
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -1231,52 +1308,44 @@ private boolean consolidateBatches() {
* This method spills only half the accumulated batches
* minimizing unnecessary disk writes. The exact count must lie between
* the minimum and maximum spill counts.
-*/
+   */
 
   private void spillFromMemory() {
 
 // Determine the number of batches to spill to create a spill file
 // of the desired size. The actual file size might be a bit larger
 // or smaller than the target, which is expected.
 
-long estSize = 0;
 int spillCount = 0;
+long spillSize = 0;
 for (InputBatch batch : bufferedBatches) {
-  estSize += batch.getDataSize();
-  if (estSize > spillFileSize) {
-break; }
+  long batchSize = batch.getDataSize();
+  spillSize += batchSize;
   spillCount++;
+  if (spillSize + batchSize / 2 > spillFileSize) {
+break; }
 }
 
-// Should not happen, but just to be sure...
+// Must always spill at least 2, even if this creates an over-size
+// spill file.
 
-if (spillCount == 0) {
-  return; }
+spillCount = Math.max(spillCount, 2);
 
 // Do the actual spill.
 
-logger.trace("Starting spill from memory. Memory = {}, Buffered batch 
count = {}, Spill batch count = {}",
- allocator.getAllocatedMemory(), bufferedBatches.size(), 
spillCount);
 mergeAndSpill(bufferedBatches, spillCount);
   }
 
   private void mergeAndSpill(LinkedList source, int 
count) {
-if (count == 0) {
-  return; }
 spilledRuns.add(doMergeAndSpill(source, count));
   }
 
   private BatchGroup.SpilledRun doMergeAndSpill(LinkedList batchGroups, int spillCount) {
 List batchesToSpill = Lists.newArrayList();
 spillCount = Math.min(batchGroups.size(), spillCount);
 assert spillCount > 0 : "Spill count to mergeAndSpill must not be 
zero";
-long spillSize = 0;
 for (int i = 0; i < spillCount; i++) {
-  @SuppressWarnings("resource")
-  BatchGroup batch = batchGroups.pollFirst();
-  assert batch != null : "Encountered a null batch during merge and 
spill operation";
-  batchesToSpill.add(batch);
-  spillSize += batch.getDataSize();
+  batchesToSpill.add(batchGroups.pollFirst());
--- End diff --

This won't happen for first-generation spills due to the check in 
`isSpillNeeded`. But, it could happen when spilling for merges, so I fixed the 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103332807
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -765,12 +838,12 @@ private void processBatch() {
   spillFromMemory();
 }
 
-// Sanity check. We should now be above the spill point.
+// Sanity check. We should now be below the buffer memory maximum.
 
 long startMem = allocator.getAllocatedMemory();
-if (memoryLimit - startMem < spillPoint) {
-  logger.error( "ERROR: Failed to spill below the spill point. Spill 
point = {}, free memory = {}",
-spillPoint, memoryLimit - startMem);
+if (startMem > bufferMemoryPool) {
+  logger.error( "ERROR: Failed to spill above buffer limit. Buffer 
pool = {}, memory = {}",
+  bufferMemoryPool, startMem);
--- End diff --

We could. But, at this point, it is a potential problem, not a real one. 
Maybe the input has no rows; we won't overflow memory. Maybe just one or two 
rows and we'll be fine.

This warning says, "if we continue as we are now, and we have large amounts 
of data, we'll run off the rails." It helps explain any later OOM error that 
occurs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r10813
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
--- End diff --

Yes. Another wonderful Drill artifact. Suppose we have 1023 bytes of data. 
We will allocate a vector of 1024 bytes in size. Suppose we have 1025 bytes of 
data. (Just 0.2% more.) We allocate a vector of 2048 bytes.

Now, we could be more conservative and assume that, on average, each vector 
will bye 3/4 full, so we should us a factor of 1.5 for the calcs. We can file a 
JIRA and experiment with this change as a future enhancement.

It would also help if the allocator didn't kill the query if we allocate 
even one extra byte. But, since math errors are fatal, we are 
super-conservative for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103335368
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -238,6 +238,25 @@ public boolean copyFromSafe(int fromIndex, int 
thisIndex, ${minor.class}Vector f
 return true;
   }
 
+  @Override
+  public int getAllocatedByteCount() {
+return offsetVector.getAllocatedByteCount() + 
super.getAllocatedByteCount();
--- End diff --

`super` is called only if the super class holds an actual buffer or nested 
vector: not all do.

A sanity test is done in the sort: we compare the calculated memory 
allocation with the actual memory delta when "taking ownership" of a batch. We 
get an error if the numbers differ. That check ensures that the calcs in each 
vector type are correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103332576
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -392,22 +448,31 @@ private void configure(DrillConfig config) {
 // Set too large and the ratio between memory and input data sizes 
becomes
 // small. Set too small and disk seek times dominate performance.
 
-spillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
-spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE);
+preferredSpillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
+
+// In low memory, use no more than 1/4 of memory for each spill batch. 
Ensures we
+// can merge.
+
+preferredSpillBatchSize = Math.min(preferredSpillBatchSize, 
memoryLimit / 4);
+
+// But, the spill batch should be above some minimum size to prevent 
complete
+// thrashing.
+
+preferredSpillBatchSize = Math.max(preferredSpillBatchSize, 
MIN_SPILL_BATCH_SIZE);
--- End diff --

Done later when we tally up memory needs and compare it to available 
memory. We issue an error log message if memory overflow is likely. This way, 
when a query fails, we can look at the log to see that we knew it would fail 
due to low memory limits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103330903
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
@@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, 
PhysicalOperator popConfig) {
 } else {
   fileManager = new HadoopFileManager(spillFs);
 }
-FragmentHandle handle = context.getHandle();
-spillDirName = String.format("%s_major%s_minor%s_op%s", 
QueryIdHelper.getQueryId(handle.getQueryId()),
-handle.getMajorFragmentId(), handle.getMinorFragmentId(), 
popConfig.getOperatorId());
+spillDirName = String.format(
--- End diff --

Either format is fine. Go ahead and overwrite this change with your 
preferred format when you commit your work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-27 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103331438
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -219,7 +220,18 @@
 
   private BatchSchema schema;
 
+  /**
+   * Incoming batches buffered in memory prior to spilling
+   * or an in-memory merge.
+   */
+
   private LinkedList bufferedBatches = 
Lists.newLinkedList();
+
+  /**
+   * Spilled runs consisting of a large number of spilled
+   * in-memory batches.
--- End diff --

Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-24 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103068736
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
--- End diff --

This factor of 2 seems extremely cautious; we may pay a little in 
performance every time, just for that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-24 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103069029
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
 spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE);
 
+// Compute the actual spill batch size which may be larger or smaller
+// than the preferred size depending on the row width. Double the 
estimated
+// memory needs to allow for power-of-two rounding.
+
+targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2;
+
 // Determine the number of records per batch per merge step. The goal 
is to
 // merge batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each merge batch, whichever is less.
 
-targetMergeBatchSize = preferredMergeBatchSize;
-mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / 
estimatedRowWidth);
+mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / 
estimatedRowWidth / 2);
 mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE);
+targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2;
 
 // Determine the minimum memory needed for spilling. Spilling is done 
just
 // before accepting a batch, so we must spill if we don't have room 
for a
 // (worst case) input batch. To spill, we need room for the output 
batch created
 // by merging the batches already in memory. Double this to allow for 
power-of-two
 // memory allocations.
 
-spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
+long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize;
 
 // The merge memory pool assumes we can spill all input batches. To 
make
 // progress, we must have at least two merge batches (same size as an 
output
 // batch) and one output batch. Again, double to allow for power-of-two
 // allocation and add one for a margin of error.
 
-int minMergeBatches = 2 * 3 + 1;
-long minMergeMemory = minMergeBatches * targetMergeBatchSize;
+long minMergeMemory = Math.round((2 * targetSpillBatchSize + 
targetMergeBatchSize) * 1.05);
 
 // If we are in a low-memory condition, then we might not have room 
for the
 // default output batch size. In that case, pick a smaller size.
 
-long minMemory = Math.max(spillPoint, minMergeMemory);
-if (minMemory > memoryLimit) {
-
-  // Figure out the minimum output batch size based on memory, but 
can't be
-  // any smaller than the defined minimum.
-
-  targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / 
minMergeBatches);
+if (minMergeMemory > memoryLimit) {
 
-  // Regardless of anything else, the batch must hold at least one
-  // complete row.
+  // Figure out the minimum output batch size based on memory,
+  // must hold at least one complete row.
 
-  targetMergeBatchSize = Math.max(estimatedRowWidth, 
targetMergeBatchSize);
-  spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
-  minMergeMemory = minMergeBatches * targetMergeBatchSize;
+  long mergeAllowance = Math.round((memoryLimit - 2 * 
targetSpillBatchSize) * 0.95);
+  targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 
2);
+  mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth 
/ 2);
--- End diff --

If estimatedRowWidth is huge, then mergeBatchRowCount may be zero ! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-24 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103067062
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -238,6 +238,25 @@ public boolean copyFromSafe(int fromIndex, int 
thisIndex, ${minor.class}Vector f
 return true;
   }
 
+  @Override
+  public int getAllocatedByteCount() {
+return offsetVector.getAllocatedByteCount() + 
super.getAllocatedByteCount();
--- End diff --

Why don't the other getAllocatedByteCount() methods add the "super" ?  Is 
it because their supers do not allocate ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-24 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103068095
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -765,12 +838,12 @@ private void processBatch() {
   spillFromMemory();
 }
 
-// Sanity check. We should now be above the spill point.
+// Sanity check. We should now be below the buffer memory maximum.
 
 long startMem = allocator.getAllocatedMemory();
-if (memoryLimit - startMem < spillPoint) {
-  logger.error( "ERROR: Failed to spill below the spill point. Spill 
point = {}, free memory = {}",
-spillPoint, memoryLimit - startMem);
+if (startMem > bufferMemoryPool) {
+  logger.error( "ERROR: Failed to spill above buffer limit. Buffer 
pool = {}, memory = {}",
+  bufferMemoryPool, startMem);
--- End diff --

Do we need to return or throw an exception here ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-24 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103066042
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
@@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, 
PhysicalOperator popConfig) {
 } else {
   fileManager = new HadoopFileManager(spillFs);
 }
-FragmentHandle handle = context.getHandle();
-spillDirName = String.format("%s_major%s_minor%s_op%s", 
QueryIdHelper.getQueryId(handle.getQueryId()),
-handle.getMajorFragmentId(), handle.getMinorFragmentId(), 
popConfig.getOperatorId());
+spillDirName = String.format(
--- End diff --

Meanwhile I changed this code to print something like:


/tmp/drill/spill/27509954-4b86-b20a-f16a-c46644f74519_HashAgg_1-3_minor0/spill1

So the operator name "HashAgg_1-3" looks like what is presented in the web 
interface.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort

2017-02-24 Thread paul-rogers
GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/761

DRILL-5284: Roll-up of final fixes for managed sort

See subtasks for details.

* Provide detailed, accurate estimate of size consumed by a record batch
* Managed external sort spills too often with Parquet data
* Managed External Sort fails with OOM
* External sort refers to the deprecated HDFS fs.default.name param
* Config param drill.exec.sort.external.batch.size is not used
* NPE in managed external sort while spilling to disk
* External Sort BatchGroup leaks memory if an OOM occurs during read
* Ensure at least two batches are merged in low-memory conditions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5284

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #761


commit 5558e9439805d595cfc9625591a276385454625f
Author: Paul Rogers 
Date:   2017-02-24T18:31:25Z

DRILL-5284: Roll-up of final fixes for managed sort

See subtasks for details.

* Provide detailed, accurate estimate of size consumed by a record batch
* Managed external sort spills too often with Parquet data
* Managed External Sort fails with OOM
* External sort refers to the deprecated HDFS fs.default.name param
* Config param drill.exec.sort.external.batch.size is not used
* NPE in managed external sort while spilling to disk
* External Sort BatchGroup leaks memory if an OOM occurs during read

commit 0028f26fef5d9b462700a28b689d47241ee3a1ce
Author: Paul Rogers 
Date:   2017-02-24T20:45:18Z

Fix for DRILL-5294

Under certain low-memory conditions, need to force the sort to merge
two batches to make progress, even though this is a bit more than
comfortably fits into memory.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---