[ https://issues.apache.org/jira/browse/DRILL-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Girish updated DRILL-6384: ----------------------------------- Description: On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits. It looks like one of the recent commits caused this. ||Commit ID||Status|| |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL| |883c8d94b0021a83059fa79563dd516c4299b70a|FAIL| |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS| |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS| Two example queries + exceptions below. Also query log attached. *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q {code} select c.c_custkey, c.c_name, sum(l.l_extendedprice * (1 - l.l_discount)) as revenue, c.c_acctbal, n.n_name, c.c_address, c.c_phone, c.c_comment from customer c, orders o, lineitem l, nation n where c.c_custkey = o.o_custkey and l.l_orderkey = o.o_orderkey and o.o_orderdate >= date '1994-03-01' and o.o_orderdate < date '1994-03-01' + interval '3' month and l.l_returnflag = 'R' and c.c_nationkey = n.n_nationkey group by c.c_custkey, c.c_name, c.c_acctbal, c.c_phone, n.n_name, c.c_address, c.c_comment order by revenue desc limit 20 {code} Exception: {code} java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. Fragment 4:88 [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010] (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634) at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. Fragment 4:88 [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010] (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 {code} *Query 2:* Advanced/tpch/tpch_sf100/parquet/08.q {code} select o_year, sum(case when nation = 'EGYPT' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o.o_orderdate) as o_year, l.l_extendedprice * (1 - l.l_discount) as volume, n2.n_name as nation from part p, supplier s, lineitem l, orders o, customer c, nation n1, nation n2, region r where p.p_partkey = l.l_partkey and s.s_suppkey = l.l_suppkey and l.l_orderkey = o.o_orderkey and o.o_custkey = c.c_custkey and c.c_nationkey = n1.n_nationkey and n1.n_regionkey = r.r_regionkey and r.r_name = 'MIDDLE EAST' and s.s_nationkey = n2.n_nationkey and o.o_orderdate between date '1995-01-01' and date '1996-12-31' and p.p_type = 'PROMO BRUSHED COPPER' ) as all_nations group by o_year order by o_year {code} Exception: {code} java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Failure allocating buffer. Fragment 4:57 [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010] (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer. io.netty.buffer.PooledByteBufAllocatorL.allocate():67 org.apache.drill.exec.memory.AllocationManager.<init>():84 org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258 org.apache.drill.exec.memory.BaseAllocator.buffer():241 org.apache.drill.exec.memory.BaseAllocator.buffer():211 org.apache.drill.exec.vector.VarCharVector.allocateNew():389 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236 org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41 org.apache.drill.exec.vector.AllocationHelper.allocate():54 org.apache.drill.exec.vector.AllocationHelper.allocate():28 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304 org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.record.AbstractRecordBatch.next():108 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368) io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510 io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464 io.netty.buffer.PoolArena$DirectArena.allocateDirect():766 io.netty.buffer.PoolArena$DirectArena.newChunk():742 io.netty.buffer.PoolArena.allocateNormal():244 io.netty.buffer.PoolArena.allocate():226 io.netty.buffer.PoolArena.allocate():146 io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169 io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201 io.netty.buffer.PooledByteBufAllocatorL.allocate():65 org.apache.drill.exec.memory.AllocationManager.<init>():84 org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258 org.apache.drill.exec.memory.BaseAllocator.buffer():241 org.apache.drill.exec.memory.BaseAllocator.buffer():211 org.apache.drill.exec.vector.VarCharVector.allocateNew():389 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236 org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41 org.apache.drill.exec.vector.AllocationHelper.allocate():54 org.apache.drill.exec.vector.AllocationHelper.allocate():28 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304 org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.record.AbstractRecordBatch.next():108 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 {code} was: On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits. It looks like one of the recent commits caused this. ||Commit ID||Status|| |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL| |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS| |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS| Two example queries + exceptions below. Also query log attached. *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q {code} select c.c_custkey, c.c_name, sum(l.l_extendedprice * (1 - l.l_discount)) as revenue, c.c_acctbal, n.n_name, c.c_address, c.c_phone, c.c_comment from customer c, orders o, lineitem l, nation n where c.c_custkey = o.o_custkey and l.l_orderkey = o.o_orderkey and o.o_orderdate >= date '1994-03-01' and o.o_orderdate < date '1994-03-01' + interval '3' month and l.l_returnflag = 'R' and c.c_nationkey = n.n_nationkey group by c.c_custkey, c.c_name, c.c_acctbal, c.c_phone, n.n_name, c.c_address, c.c_comment order by revenue desc limit 20 {code} Exception: {code} java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. Fragment 4:88 [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010] (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634) at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. Fragment 4:88 [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010] (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 {code} *Query 2:* Advanced/tpch/tpch_sf100/parquet/08.q {code} select o_year, sum(case when nation = 'EGYPT' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o.o_orderdate) as o_year, l.l_extendedprice * (1 - l.l_discount) as volume, n2.n_name as nation from part p, supplier s, lineitem l, orders o, customer c, nation n1, nation n2, region r where p.p_partkey = l.l_partkey and s.s_suppkey = l.l_suppkey and l.l_orderkey = o.o_orderkey and o.o_custkey = c.c_custkey and c.c_nationkey = n1.n_nationkey and n1.n_regionkey = r.r_regionkey and r.r_name = 'MIDDLE EAST' and s.s_nationkey = n2.n_nationkey and o.o_orderdate between date '1995-01-01' and date '1996-12-31' and p.p_type = 'PROMO BRUSHED COPPER' ) as all_nations group by o_year order by o_year {code} Exception: {code} java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Failure allocating buffer. Fragment 4:57 [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010] (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer. io.netty.buffer.PooledByteBufAllocatorL.allocate():67 org.apache.drill.exec.memory.AllocationManager.<init>():84 org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258 org.apache.drill.exec.memory.BaseAllocator.buffer():241 org.apache.drill.exec.memory.BaseAllocator.buffer():211 org.apache.drill.exec.vector.VarCharVector.allocateNew():389 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236 org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41 org.apache.drill.exec.vector.AllocationHelper.allocate():54 org.apache.drill.exec.vector.AllocationHelper.allocate():28 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304 org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.record.AbstractRecordBatch.next():108 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368) io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510 io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464 io.netty.buffer.PoolArena$DirectArena.allocateDirect():766 io.netty.buffer.PoolArena$DirectArena.newChunk():742 io.netty.buffer.PoolArena.allocateNormal():244 io.netty.buffer.PoolArena.allocate():226 io.netty.buffer.PoolArena.allocate():146 io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169 io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201 io.netty.buffer.PooledByteBufAllocatorL.allocate():65 org.apache.drill.exec.memory.AllocationManager.<init>():84 org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258 org.apache.drill.exec.memory.BaseAllocator.buffer():241 org.apache.drill.exec.memory.BaseAllocator.buffer():211 org.apache.drill.exec.vector.VarCharVector.allocateNew():389 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236 org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41 org.apache.drill.exec.vector.AllocationHelper.allocate():54 org.apache.drill.exec.vector.AllocationHelper.allocate():28 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446 org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304 org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267 org.apache.drill.exec.physical.impl.ScanBatch.next():175 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.record.AbstractRecordBatch.next():108 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():118 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127 org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 {code} > TPC-H tests fail with OOM > ------------------------- > > Key: DRILL-6384 > URL: https://issues.apache.org/jira/browse/DRILL-6384 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.14.0 > Reporter: Abhishek Girish > Priority: Critical > Attachments: drillbit.log.txt > > > On latest Apache master, we are observing that there are multiple test > failures. It looks like Drill runs out of Direct memory and queries fail with > OOM. Few other queries fail probably fail because they are unable to connect > to Drillbits. > It looks like one of the recent commits caused this. > ||Commit ID||Status|| > |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL| > |883c8d94b0021a83059fa79563dd516c4299b70a|FAIL| > |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS| > |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS| > Two example queries + exceptions below. Also query log attached. > *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q > {code} > select > c.c_custkey, > c.c_name, > sum(l.l_extendedprice * (1 - l.l_discount)) as revenue, > c.c_acctbal, > n.n_name, > c.c_address, > c.c_phone, > c.c_comment > from > customer c, > orders o, > lineitem l, > nation n > where > c.c_custkey = o.o_custkey > and l.l_orderkey = o.o_orderkey > and o.o_orderdate >= date '1994-03-01' > and o.o_orderdate < date '1994-03-01' + interval '3' month > and l.l_returnflag = 'R' > and c.c_nationkey = n.n_nationkey > group by > c.c_custkey, > c.c_name, > c.c_acctbal, > c.c_phone, > n.n_name, > c.c_address, > c.c_comment > order by > revenue desc > limit 20 > {code} > Exception: > {code} > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory > while executing the query. > AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. > values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory > limit: 313709266 so far allocated: 2097152. > Fragment 4:88 > [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First > Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. > Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far > allocated: 2097152. > > org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419 > org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381 > > org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304 > org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530) > at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634) > at > oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155) > at > org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253) > at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: > RESOURCE ERROR: One or more nodes ran out of memory while executing the query. > AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. > values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory > limit: 313709266 so far allocated: 2097152. > Fragment 4:88 > [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First > Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. > Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far > allocated: 2097152. > > org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419 > org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381 > > org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304 > org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {code} > *Query 2:* > Advanced/tpch/tpch_sf100/parquet/08.q > {code} > select > o_year, > sum(case > when nation = 'EGYPT' then volume > else 0 > end) / sum(volume) as mkt_share > from > ( > select > extract(year from o.o_orderdate) as o_year, > l.l_extendedprice * (1 - l.l_discount) as volume, > n2.n_name as nation > from > part p, > supplier s, > lineitem l, > orders o, > customer c, > nation n1, > nation n2, > region r > where > p.p_partkey = l.l_partkey > and s.s_suppkey = l.l_suppkey > and l.l_orderkey = o.o_orderkey > and o.o_custkey = c.c_custkey > and c.c_nationkey = n1.n_nationkey > and n1.n_regionkey = r.r_regionkey > and r.r_name = 'MIDDLE EAST' > and s.s_nationkey = n2.n_nationkey > and o.o_orderdate between date '1995-01-01' and date '1996-12-31' > and p.p_type = 'PROMO BRUSHED COPPER' > ) as all_nations > group by > o_year > order by > o_year > {code} > Exception: > {code} > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory > while executing the query. > Failure allocating buffer. > Fragment 4:57 > [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating > buffer. > io.netty.buffer.PooledByteBufAllocatorL.allocate():67 > org.apache.drill.exec.memory.AllocationManager.<init>():84 > org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258 > org.apache.drill.exec.memory.BaseAllocator.buffer():241 > org.apache.drill.exec.memory.BaseAllocator.buffer():211 > org.apache.drill.exec.vector.VarCharVector.allocateNew():389 > org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236 > > org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41 > org.apache.drill.exec.vector.AllocationHelper.allocate():54 > org.apache.drill.exec.vector.AllocationHelper.allocate():28 > > org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446 > org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304 > > org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.record.AbstractRecordBatch.next():118 > org.apache.drill.exec.record.AbstractRecordBatch.next():108 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():118 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():118 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to > allocate 16777216 byte(s) of direct memory (used: 34359738368, max: > 34359738368) > io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510 > io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464 > io.netty.buffer.PoolArena$DirectArena.allocateDirect():766 > io.netty.buffer.PoolArena$DirectArena.newChunk():742 > io.netty.buffer.PoolArena.allocateNormal():244 > io.netty.buffer.PoolArena.allocate():226 > io.netty.buffer.PoolArena.allocate():146 > > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169 > io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201 > io.netty.buffer.PooledByteBufAllocatorL.allocate():65 > org.apache.drill.exec.memory.AllocationManager.<init>():84 > org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258 > org.apache.drill.exec.memory.BaseAllocator.buffer():241 > org.apache.drill.exec.memory.BaseAllocator.buffer():211 > org.apache.drill.exec.vector.VarCharVector.allocateNew():389 > org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236 > > org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41 > org.apache.drill.exec.vector.AllocationHelper.allocate():54 > org.apache.drill.exec.vector.AllocationHelper.allocate():28 > > org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446 > org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304 > > org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267 > org.apache.drill.exec.physical.impl.ScanBatch.next():175 > org.apache.drill.exec.record.AbstractRecordBatch.next():118 > org.apache.drill.exec.record.AbstractRecordBatch.next():108 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():118 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():118 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127 > > org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)