[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

ASF GitHub Bot (JIRA) Sat, 09 Sep 2017 19:59:46 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160181#comment-16160181
 ]


ASF GitHub Bot commented on DRILL-5694:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/938#discussion_r137939336
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -646,6 +687,46 @@ public AggOutcome doWork() {
       }
     
       /**
    +   *   Use reserved values memory (if available) to try and preemp an OOM
    +   */
    +  private void useReservedValuesMemory() {
    +    // try to preempt an OOM by using the reserved memory
    +    long reservedMemory = reserveValueBatchMemory;
    +    if ( reservedMemory > 0 ) { allocator.setLimit(allocator.getLimit() + 
reservedMemory); }
    +
    +    reserveValueBatchMemory = 0;
    +  }
    +  /**
    +   *   Use reserved outgoing output memory (if available) to try and 
preemp an OOM
    +   */
    +  private void useReservedOutgoingMemory() {
    +    // try to preempt an OOM by using the reserved memory
    +    long reservedMemory = reserveOutgoingMemory;
    +    if ( reservedMemory > 0 ) { allocator.setLimit(allocator.getLimit() + 
reservedMemory); }
    +
    +    reserveOutgoingMemory = 0;
    +  }
    +  /**
    +   *  Restore the reserve memory (both)
    +   *
    +   */
    +  private void restoreReservedMemory() {
    +    if ( 0 == reserveOutgoingMemory ) { // always restore OutputValues 
first (needed for spilling)
    +      long memAvail = allocator.getLimit() - 
allocator.getAllocatedMemory();
    +      if ( memAvail > estOutgoingAllocSize) {
    +        allocator.setLimit(allocator.getLimit() - estOutgoingAllocSize);
    --- End diff --
    
    Done this way, it is necessary for this reviewer to mentally track the 
current allocator limit. Can we ever subtract an amount twice? Add it twice?
    
    Now, it seems we should not alter the allocator. But, if we do, shouldn't 
it be based on absolute amounts, not relative deltas?


> hash agg spill to disk, second phase OOM
> ----------------------------------------
>
>                 Key: DRILL-5694
>                 URL: https://issues.apache.org/jira/browse/DRILL-5694
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>            Reporter: Chun Chang
>            Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
>     
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
>     org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
>     org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745
>   Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to 
> allocate buffer of size 4194304 due to memory limit. Current allocation: 
> 534773760
>     org.apache.drill.exec.memory.BaseAllocator.buffer():238
>     org.apache.drill.exec.memory.BaseAllocator.buffer():213
>     org.apache.drill.exec.vector.IntVector.allocateBytes():231
>     org.apache.drill.exec.vector.IntVector.allocateNew():211
>     
> org.apache.drill.exec.test.generated.HashTableGen2141.allocMetadataVector():778
>     
> org.apache.drill.exec.test.generated.HashTableGen2141.resizeAndRehashIfNeeded():717
>     org.apache.drill.exec.test.generated.HashTableGen2141.insertEntry():643
>     org.apache.drill.exec.test.generated.HashTableGen2141.put():618
>     
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1173
>     org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
>     org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745
>       at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>       at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:593)
>       at 
> oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:215)
>       at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:140)
>       at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:220)
>       at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

Reply via email to