[ https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160189#comment-16160189 ]
ASF GitHub Bot commented on DRILL-5694: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r137939253 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -382,19 +390,25 @@ private void delayedSetup() { final boolean fallbackEnabled = context.getOptions().getOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY).bool_val; // Set the number of partitions from the configuration (raise to a power of two, if needed) - numPartitions = context.getConfig().getInt(ExecConstants.HASHAGG_NUM_PARTITIONS); - if ( numPartitions == 1 ) { + numPartitions = (int)context.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR); + if ( numPartitions == 1 && is2ndPhase ) { // 1st phase can still do early return with 1 partition canSpill = false; logger.warn("Spilling is disabled due to configuration setting of num_partitions to 1"); } numPartitions = BaseAllocator.nextPowerOfTwo(numPartitions); // in case not a power of 2 - if ( schema == null ) { estMaxBatchSize = 0; } // incoming was an empty batch + if ( schema == null ) { estValuesBatchSize = estOutgoingAllocSize = estMaxBatchSize = 0; } // incoming was an empty batch else { // Estimate the max batch size; should use actual data (e.g. lengths of varchars) updateEstMaxBatchSize(incoming); } - long memAvail = memoryLimit - allocator.getAllocatedMemory(); + // create "reserved memory" and adjust the memory limit down + reserveValueBatchMemory = reserveOutgoingMemory = estValuesBatchSize ; + long newMemoryLimit = allocator.getLimit() - reserveValueBatchMemory - reserveOutgoingMemory ; + long memAvail = newMemoryLimit - allocator.getAllocatedMemory(); + if ( memAvail <= 0 ) { throw new OutOfMemoryException("Too little memory available"); } + allocator.setLimit(newMemoryLimit); + --- End diff -- This code has grown to be incredibly complex with many, many paths through the various functions. Tests are handy things. Do we have system-level unit tests that exercise each path through the code? Otherwise, as a reviewer, how can I be sure that each execution path does, in fact, work? > hash agg spill to disk, second phase OOM > ---------------------------------------- > > Key: DRILL-5694 > URL: https://issues.apache.org/jira/browse/DRILL-5694 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.11.0 > Reporter: Chun Chang > Assignee: Boaz Ben-Zvi > > | 1.11.0-SNAPSHOT | d622f76ee6336d97c9189fc589befa7b0f4189d6 | DRILL-5165: > For limit all case, no need to push down limit to scan | 21.07.2017 @ > 10:36:29 PDT > Second phase agg ran out of memory. Not suppose to. Test data currently only > accessible locally. > /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q > Query: > select row_count, sum(row_count), avg(double_field), max(double_rand), > count(float_rand) from parquet_500m_v1 group by row_count order by row_count > limit 30 > Failed with exception > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory > while executing the query. > HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: > 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: > 536870912 so far allocated: 534773760. > Fragment 1:6 > [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 > OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned > batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far > allocated: 534773760. > > org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175 > org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():415 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():227 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to > allocate buffer of size 4194304 due to memory limit. Current allocation: > 534773760 > org.apache.drill.exec.memory.BaseAllocator.buffer():238 > org.apache.drill.exec.memory.BaseAllocator.buffer():213 > org.apache.drill.exec.vector.IntVector.allocateBytes():231 > org.apache.drill.exec.vector.IntVector.allocateNew():211 > > org.apache.drill.exec.test.generated.HashTableGen2141.allocMetadataVector():778 > > org.apache.drill.exec.test.generated.HashTableGen2141.resizeAndRehashIfNeeded():717 > org.apache.drill.exec.test.generated.HashTableGen2141.insertEntry():643 > org.apache.drill.exec.test.generated.HashTableGen2141.put():618 > > org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1173 > org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():415 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():227 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489) > at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:593) > at > oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:215) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:140) > at > org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:220) > at > org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: > RESOURCE ERROR: One or more nodes ran out of memory while executing the query. > HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: > 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: > 536870912 so far allocated: 534773760. -- This message was sent by Atlassian JIRA (v6.4.14#64029)