[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

ASF GitHub Bot (JIRA) Sat, 09 Sep 2017 20:00:06 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160189#comment-16160189
 ]


ASF GitHub Bot commented on DRILL-5694:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/938#discussion_r137939253
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -382,19 +390,25 @@ private void delayedSetup() {
         final boolean fallbackEnabled = 
context.getOptions().getOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY).bool_val;
     
         // Set the number of partitions from the configuration (raise to a 
power of two, if needed)
    -    numPartitions = 
context.getConfig().getInt(ExecConstants.HASHAGG_NUM_PARTITIONS);
    -    if ( numPartitions == 1 ) {
    +    numPartitions = 
(int)context.getOptions().getOption(ExecConstants.HASHAGG_NUM_PARTITIONS_VALIDATOR);
    +    if ( numPartitions == 1 && is2ndPhase  ) { // 1st phase can still do 
early return with 1 partition
           canSpill = false;
           logger.warn("Spilling is disabled due to configuration setting of 
num_partitions to 1");
         }
         numPartitions = BaseAllocator.nextPowerOfTwo(numPartitions); // in 
case not a power of 2
     
    -    if ( schema == null ) { estMaxBatchSize = 0; } // incoming was an 
empty batch
    +    if ( schema == null ) { estValuesBatchSize = estOutgoingAllocSize = 
estMaxBatchSize = 0; } // incoming was an empty batch
         else {
           // Estimate the max batch size; should use actual data (e.g. lengths 
of varchars)
           updateEstMaxBatchSize(incoming);
         }
    -    long memAvail = memoryLimit - allocator.getAllocatedMemory();
    +    // create "reserved memory" and adjust the memory limit down
    +    reserveValueBatchMemory = reserveOutgoingMemory = estValuesBatchSize ;
    +    long newMemoryLimit = allocator.getLimit() - reserveValueBatchMemory - 
reserveOutgoingMemory ;
    +    long memAvail = newMemoryLimit - allocator.getAllocatedMemory();
    +    if ( memAvail <= 0 ) { throw new OutOfMemoryException("Too little 
memory available"); }
    +    allocator.setLimit(newMemoryLimit);
    +
    --- End diff --
    
    This code has grown to be incredibly complex with many, many paths through 
the various functions.
    
    Tests are handy things. Do we have system-level unit tests that exercise 
each path through the code? Otherwise, as a reviewer, how can I be sure that 
each execution path does, in fact, work?


> hash agg spill to disk, second phase OOM
> ----------------------------------------
>
>                 Key: DRILL-5694
>                 URL: https://issues.apache.org/jira/browse/DRILL-5694
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>            Reporter: Chun Chang
>            Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
>     
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
>     org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
>     org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745
>   Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to 
> allocate buffer of size 4194304 due to memory limit. Current allocation: 
> 534773760
>     org.apache.drill.exec.memory.BaseAllocator.buffer():238
>     org.apache.drill.exec.memory.BaseAllocator.buffer():213
>     org.apache.drill.exec.vector.IntVector.allocateBytes():231
>     org.apache.drill.exec.vector.IntVector.allocateNew():211
>     
> org.apache.drill.exec.test.generated.HashTableGen2141.allocMetadataVector():778
>     
> org.apache.drill.exec.test.generated.HashTableGen2141.resizeAndRehashIfNeeded():717
>     org.apache.drill.exec.test.generated.HashTableGen2141.insertEntry():643
>     org.apache.drill.exec.test.generated.HashTableGen2141.put():618
>     
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1173
>     org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
>     org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745
>       at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>       at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:593)
>       at 
> oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:215)
>       at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:140)
>       at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:220)
>       at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

Reply via email to