[ 
https://issues.apache.org/jira/browse/DRILL-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-6384:
--------------------------------------

    Assignee: Vitalii Diravka  (was: Arina Ielchiieva)

> TPC-H tests fail with OOM
> -------------------------
>
>                 Key: DRILL-6384
>                 URL: https://issues.apache.org/jira/browse/DRILL-6384
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.14.0
>            Reporter: Abhishek Girish
>            Assignee: Vitalii Diravka
>            Priority: Critical
>         Attachments: drillbit.log.txt
>
>
> On latest Apache master, we are observing that there are multiple test 
> failures. It looks like Drill runs out of Direct memory and queries fail with 
> OOM. Few other queries fail probably fail because they are unable to connect 
> to Drillbits.
> It looks like one of the recent commits caused this.
> ||Commit ID||Status||
> |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
> |883c8d94b0021a83059fa79563dd516c4299b70a|FAIL|
> |2601cdd33e0685f59a7bf2ac72541bd9dcaaa18f|FAIL|
> |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
> |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|
> Two example queries + exceptions below. Also query log attached. 
> *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
> {code}
>  select
>  c.c_custkey,
>  c.c_name,
>  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
>  c.c_acctbal,
>  n.n_name,
>  c.c_address,
>  c.c_phone,
>  c.c_comment
>  from
>  customer c,
>  orders o,
>  lineitem l,
>  nation n
>  where
>  c.c_custkey = o.o_custkey
>  and l.l_orderkey = o.o_orderkey
>  and o.o_orderdate >= date '1994-03-01'
>  and o.o_orderdate < date '1994-03-01' + interval '3' month
>  and l.l_returnflag = 'R'
>  and c.c_nationkey = n.n_nationkey
>  group by
>  c.c_custkey,
>  c.c_name,
>  c.c_acctbal,
>  c.c_phone,
>  n.n_name,
>  c.c_address,
>  c.c_comment
>  order by
>  revenue desc
>  limit 20
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. 
> values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory 
> limit: 313709266 so far allocated: 2097152. 
>  Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First 
> Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. 
> Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far 
> allocated: 2097152. 
>  
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
>  
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
>  org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
>  org.apache.drill.exec.record.AbstractRecordBatch.next():164
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>  
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1595
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
>  at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
>  at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
>  at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. 
> values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory 
> limit: 313709266 so far allocated: 2097152. 
>  Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First 
> Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. 
> Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far 
> allocated: 2097152. 
>  
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
>  
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
>  org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
>  org.apache.drill.exec.record.AbstractRecordBatch.next():164
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>  
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1595
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748
> {code}
>  *Query 2:* 
> Advanced/tpch/tpch_sf100/parquet/08.q
> {code}
> select
> o_year,
> sum(case
> when nation = 'EGYPT' then volume
> else 0
> end) / sum(volume) as mkt_share
> from
> (
> select
> extract(year from o.o_orderdate) as o_year,
> l.l_extendedprice * (1 - l.l_discount) as volume,
> n2.n_name as nation
> from
> part p,
> supplier s,
> lineitem l,
> orders o,
> customer c,
> nation n1,
> nation n2,
> region r
> where
> p.p_partkey = l.l_partkey
> and s.s_suppkey = l.l_suppkey
> and l.l_orderkey = o.o_orderkey
> and o.o_custkey = c.c_custkey
> and c.c_nationkey = n1.n_nationkey
> and n1.n_regionkey = r.r_regionkey
> and r.r_name = 'MIDDLE EAST'
> and s.s_nationkey = n2.n_nationkey
> and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
> and p.p_type = 'PROMO BRUSHED COPPER'
> ) as all_nations
> group by
> o_year
> order by
> o_year
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Failure allocating buffer.
> Fragment 4:57
> [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating 
> buffer.
>     io.netty.buffer.PooledByteBufAllocatorL.allocate():67
>     org.apache.drill.exec.memory.AllocationManager.<init>():84
>     org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
>     org.apache.drill.exec.memory.BaseAllocator.buffer():241
>     org.apache.drill.exec.memory.BaseAllocator.buffer():211
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():389
>     org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
>     
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
>     org.apache.drill.exec.vector.AllocationHelper.allocate():54
>     org.apache.drill.exec.vector.AllocationHelper.allocate():28
>     
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
>     
> org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.record.AbstractRecordBatch.next():108
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
>   Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to 
> allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 
> 34359738368)
>     io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
>     io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
>     io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
>     io.netty.buffer.PoolArena$DirectArena.newChunk():742
>     io.netty.buffer.PoolArena.allocateNormal():244
>     io.netty.buffer.PoolArena.allocate():226
>     io.netty.buffer.PoolArena.allocate():146
>     
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
>     io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
>     io.netty.buffer.PooledByteBufAllocatorL.allocate():65
>     org.apache.drill.exec.memory.AllocationManager.<init>():84
>     org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
>     org.apache.drill.exec.memory.BaseAllocator.buffer():241
>     org.apache.drill.exec.memory.BaseAllocator.buffer():211
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():389
>     org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
>     
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
>     org.apache.drill.exec.vector.AllocationHelper.allocate():54
>     org.apache.drill.exec.vector.AllocationHelper.allocate():28
>     
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
>     
> org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.record.AbstractRecordBatch.next():108
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
>     
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to