[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/761 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103332364 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -392,22 +448,31 @@ private void configure(DrillConfig config) { // Set too large and the ratio between memory and input data sizes becomes // small. Set too small and disk seek times dominate performance. -spillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); -spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE); +preferredSpillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); + +// In low memory, use no more than 1/4 of memory for each spill batch. Ensures we +// can merge. + +preferredSpillBatchSize = Math.min(preferredSpillBatchSize, memoryLimit / 4); --- End diff -- In low memory conditions, restrict the spill batch size to 1/4 of memory. Why? * We need to accumulate at least 2 such batches to do a merge. (Now at 1/2 of memory.) * We need to create an output batch from the two inputs (3/4 of memory). * Need overhead for other direct memory uses. (Remaining 1/4 of memory.) Sadly, memory management in Drill is not very precise: batch sizes can't be predicted with any accuracy. Trying to use, say, 1/3 of memory for the spill batch would seem more logical. (Two batches into the merge, one out), but the allocator issues a fatal error if we guess wrong by even one byte. So, we are forced to be conservative. If we had better control, or a more forgiving allocator, we could make different choices. Also, why try to sort GBs of data in 20 MB? Yet, this is the test case that had to be solved and that this particular fix enables. I'm open to suggestions for better solutions; this is a very tricky area... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103335045 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -1231,52 +1308,44 @@ private boolean consolidateBatches() { * This method spills only half the accumulated batches * minimizing unnecessary disk writes. The exact count must lie between * the minimum and maximum spill counts. -*/ + */ private void spillFromMemory() { // Determine the number of batches to spill to create a spill file // of the desired size. The actual file size might be a bit larger // or smaller than the target, which is expected. -long estSize = 0; int spillCount = 0; +long spillSize = 0; for (InputBatch batch : bufferedBatches) { - estSize += batch.getDataSize(); - if (estSize > spillFileSize) { -break; } + long batchSize = batch.getDataSize(); + spillSize += batchSize; spillCount++; + if (spillSize + batchSize / 2 > spillFileSize) { +break; } } -// Should not happen, but just to be sure... +// Must always spill at least 2, even if this creates an over-size +// spill file. -if (spillCount == 0) { - return; } +spillCount = Math.max(spillCount, 2); // Do the actual spill. -logger.trace("Starting spill from memory. Memory = {}, Buffered batch count = {}, Spill batch count = {}", - allocator.getAllocatedMemory(), bufferedBatches.size(), spillCount); mergeAndSpill(bufferedBatches, spillCount); } private void mergeAndSpill(LinkedList source, int count) { -if (count == 0) { - return; } spilledRuns.add(doMergeAndSpill(source, count)); } private BatchGroup.SpilledRun doMergeAndSpill(LinkedList batchGroups, int spillCount) { List batchesToSpill = Lists.newArrayList(); spillCount = Math.min(batchGroups.size(), spillCount); assert spillCount > 0 : "Spill count to mergeAndSpill must not be zero"; -long spillSize = 0; for (int i = 0; i < spillCount; i++) { - @SuppressWarnings("resource") - BatchGroup batch = batchGroups.pollFirst(); - assert batch != null : "Encountered a null batch during merge and spill operation"; - batchesToSpill.add(batch); - spillSize += batch.getDataSize(); + batchesToSpill.add(batchGroups.pollFirst()); --- End diff -- This won't happen for first-generation spills due to the check in `isSpillNeeded`. But, it could happen when spilling for merges, so I fixed the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103332807 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -765,12 +838,12 @@ private void processBatch() { spillFromMemory(); } -// Sanity check. We should now be above the spill point. +// Sanity check. We should now be below the buffer memory maximum. long startMem = allocator.getAllocatedMemory(); -if (memoryLimit - startMem < spillPoint) { - logger.error( "ERROR: Failed to spill below the spill point. Spill point = {}, free memory = {}", -spillPoint, memoryLimit - startMem); +if (startMem > bufferMemoryPool) { + logger.error( "ERROR: Failed to spill above buffer limit. Buffer pool = {}, memory = {}", + bufferMemoryPool, startMem); --- End diff -- We could. But, at this point, it is a potential problem, not a real one. Maybe the input has no rows; we won't overflow memory. Maybe just one or two rows and we'll be fine. This warning says, "if we continue as we are now, and we have large amounts of data, we'll run off the rails." It helps explain any later OOM error that occurs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r10813 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer sizer) { // spill batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each spill batch, whichever is less. -spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth); +spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth / 2); --- End diff -- Yes. Another wonderful Drill artifact. Suppose we have 1023 bytes of data. We will allocate a vector of 1024 bytes in size. Suppose we have 1025 bytes of data. (Just 0.2% more.) We allocate a vector of 2048 bytes. Now, we could be more conservative and assume that, on average, each vector will bye 3/4 full, so we should us a factor of 1.5 for the calcs. We can file a JIRA and experiment with this change as a future enhancement. It would also help if the allocator didn't kill the query if we allocate even one extra byte. But, since math errors are fatal, we are super-conservative for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103335368 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -238,6 +238,25 @@ public boolean copyFromSafe(int fromIndex, int thisIndex, ${minor.class}Vector f return true; } + @Override + public int getAllocatedByteCount() { +return offsetVector.getAllocatedByteCount() + super.getAllocatedByteCount(); --- End diff -- `super` is called only if the super class holds an actual buffer or nested vector: not all do. A sanity test is done in the sort: we compare the calculated memory allocation with the actual memory delta when "taking ownership" of a batch. We get an error if the numbers differ. That check ensures that the calcs in each vector type are correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103332576 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -392,22 +448,31 @@ private void configure(DrillConfig config) { // Set too large and the ratio between memory and input data sizes becomes // small. Set too small and disk seek times dominate performance. -spillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); -spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE); +preferredSpillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE); + +// In low memory, use no more than 1/4 of memory for each spill batch. Ensures we +// can merge. + +preferredSpillBatchSize = Math.min(preferredSpillBatchSize, memoryLimit / 4); + +// But, the spill batch should be above some minimum size to prevent complete +// thrashing. + +preferredSpillBatchSize = Math.max(preferredSpillBatchSize, MIN_SPILL_BATCH_SIZE); --- End diff -- Done later when we tally up memory needs and compare it to available memory. We issue an error log message if memory overflow is likely. This way, when a query fails, we can look at the log to see that we knew it would fail due to low memory limits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103330903 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java --- @@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, PhysicalOperator popConfig) { } else { fileManager = new HadoopFileManager(spillFs); } -FragmentHandle handle = context.getHandle(); -spillDirName = String.format("%s_major%s_minor%s_op%s", QueryIdHelper.getQueryId(handle.getQueryId()), -handle.getMajorFragmentId(), handle.getMinorFragmentId(), popConfig.getOperatorId()); +spillDirName = String.format( --- End diff -- Either format is fine. Go ahead and overwrite this change with your preferred format when you commit your work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103331438 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -219,7 +220,18 @@ private BatchSchema schema; + /** + * Incoming batches buffered in memory prior to spilling + * or an in-memory merge. + */ + private LinkedList bufferedBatches = Lists.newLinkedList(); + + /** + * Spilled runs consisting of a large number of spilled + * in-memory batches. --- End diff -- Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103068736 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer sizer) { // spill batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each spill batch, whichever is less. -spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth); +spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth / 2); --- End diff -- This factor of 2 seems extremely cautious; we may pay a little in performance every time, just for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103069029 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer sizer) { // spill batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each spill batch, whichever is less. -spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth); +spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth / 2); spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE); +// Compute the actual spill batch size which may be larger or smaller +// than the preferred size depending on the row width. Double the estimated +// memory needs to allow for power-of-two rounding. + +targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2; + // Determine the number of records per batch per merge step. The goal is to // merge batches of either 64K records, or as many records as fit into the // amount of memory dedicated to each merge batch, whichever is less. -targetMergeBatchSize = preferredMergeBatchSize; -mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / estimatedRowWidth); +mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / estimatedRowWidth / 2); mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE); +targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2; // Determine the minimum memory needed for spilling. Spilling is done just // before accepting a batch, so we must spill if we don't have room for a // (worst case) input batch. To spill, we need room for the output batch created // by merging the batches already in memory. Double this to allow for power-of-two // memory allocations. -spillPoint = estimatedInputBatchSize + 2 * spillBatchSize; +long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize; // The merge memory pool assumes we can spill all input batches. To make // progress, we must have at least two merge batches (same size as an output // batch) and one output batch. Again, double to allow for power-of-two // allocation and add one for a margin of error. -int minMergeBatches = 2 * 3 + 1; -long minMergeMemory = minMergeBatches * targetMergeBatchSize; +long minMergeMemory = Math.round((2 * targetSpillBatchSize + targetMergeBatchSize) * 1.05); // If we are in a low-memory condition, then we might not have room for the // default output batch size. In that case, pick a smaller size. -long minMemory = Math.max(spillPoint, minMergeMemory); -if (minMemory > memoryLimit) { - - // Figure out the minimum output batch size based on memory, but can't be - // any smaller than the defined minimum. - - targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / minMergeBatches); +if (minMergeMemory > memoryLimit) { - // Regardless of anything else, the batch must hold at least one - // complete row. + // Figure out the minimum output batch size based on memory, + // must hold at least one complete row. - targetMergeBatchSize = Math.max(estimatedRowWidth, targetMergeBatchSize); - spillPoint = estimatedInputBatchSize + 2 * spillBatchSize; - minMergeMemory = minMergeBatches * targetMergeBatchSize; + long mergeAllowance = Math.round((memoryLimit - 2 * targetSpillBatchSize) * 0.95); + targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 2); + mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth / 2); --- End diff -- If estimatedRowWidth is huge, then mergeBatchRowCount may be zero ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103067062 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -238,6 +238,25 @@ public boolean copyFromSafe(int fromIndex, int thisIndex, ${minor.class}Vector f return true; } + @Override + public int getAllocatedByteCount() { +return offsetVector.getAllocatedByteCount() + super.getAllocatedByteCount(); --- End diff -- Why don't the other getAllocatedByteCount() methods add the "super" ? Is it because their supers do not allocate ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103068095 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -765,12 +838,12 @@ private void processBatch() { spillFromMemory(); } -// Sanity check. We should now be above the spill point. +// Sanity check. We should now be below the buffer memory maximum. long startMem = allocator.getAllocatedMemory(); -if (memoryLimit - startMem < spillPoint) { - logger.error( "ERROR: Failed to spill below the spill point. Spill point = {}, free memory = {}", -spillPoint, memoryLimit - startMem); +if (startMem > bufferMemoryPool) { + logger.error( "ERROR: Failed to spill above buffer limit. Buffer pool = {}, memory = {}", + bufferMemoryPool, startMem); --- End diff -- Do we need to return or throw an exception here ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/761#discussion_r103066042 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java --- @@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, PhysicalOperator popConfig) { } else { fileManager = new HadoopFileManager(spillFs); } -FragmentHandle handle = context.getHandle(); -spillDirName = String.format("%s_major%s_minor%s_op%s", QueryIdHelper.getQueryId(handle.getQueryId()), -handle.getMajorFragmentId(), handle.getMinorFragmentId(), popConfig.getOperatorId()); +spillDirName = String.format( --- End diff -- Meanwhile I changed this code to print something like: /tmp/drill/spill/27509954-4b86-b20a-f16a-c46644f74519_HashAgg_1-3_minor0/spill1 So the operator name "HashAgg_1-3" looks like what is presented in the web interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #761: DRILL-5284: Roll-up of final fixes for managed sort
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/761 DRILL-5284: Roll-up of final fixes for managed sort See subtasks for details. * Provide detailed, accurate estimate of size consumed by a record batch * Managed external sort spills too often with Parquet data * Managed External Sort fails with OOM * External sort refers to the deprecated HDFS fs.default.name param * Config param drill.exec.sort.external.batch.size is not used * NPE in managed external sort while spilling to disk * External Sort BatchGroup leaks memory if an OOM occurs during read * Ensure at least two batches are merged in low-memory conditions You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5284 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/761.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #761 commit 5558e9439805d595cfc9625591a276385454625f Author: Paul RogersDate: 2017-02-24T18:31:25Z DRILL-5284: Roll-up of final fixes for managed sort See subtasks for details. * Provide detailed, accurate estimate of size consumed by a record batch * Managed external sort spills too often with Parquet data * Managed External Sort fails with OOM * External sort refers to the deprecated HDFS fs.default.name param * Config param drill.exec.sort.external.batch.size is not used * NPE in managed external sort while spilling to disk * External Sort BatchGroup leaks memory if an OOM occurs during read commit 0028f26fef5d9b462700a28b689d47241ee3a1ce Author: Paul Rogers Date: 2017-02-24T20:45:18Z Fix for DRILL-5294 Under certain low-memory conditions, need to force the sort to merge two batches to make progress, even though this is a bit more than comfortably fits into memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---