[ https://issues.apache.org/jira/browse/DRILL-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448510#comment-16448510 ]
Khurram Faraaz commented on DRILL-6329: --------------------------------------- With both the options set TPC-DS query 66 still fails, due to one or more nodes running out of memory. Apache Drill 1.14.0 commit : 931b43e; SF1 parquet data on 4 nodes; alter system set `drill.exec.hashagg.fallback.enabled`=true; alter system set `planner.memory.max_query_memory_per_node` = 10737418240; Stack trace for TPC-DS Query 66 {noformat} Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Too little memory available Fragment 2:0 [Error Id: 7d2abddb-eda9-4dc0-90e5-d5486942813e on qa102-45.qa.lab:31010] (state=,code=0) java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Too little memory available Fragment 2:0 [Error Id: 7d2abddb-eda9-4dc0-90e5-d5486942813e on qa102-45.qa.lab:31010] at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at java.lang.Thread.run(Thread.java:748) {noformat} > TPC-DS Query 66 failed due to OOM > --------------------------------- > > Key: DRILL-6329 > URL: https://issues.apache.org/jira/browse/DRILL-6329 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.14.0 > Reporter: Khurram Faraaz > Priority: Critical > Attachments: 252f0f20-2774-43d7-ec31-911ee0f5f330.sys.drill, > TPCDS_Query_66.sql, TPCDS_Query_66_PLAN.txt > > > TPC-DS Query 66 failed after 27 minutes on Drill 1.14.0 on a 4 node cluster > against SF1 parquet data (dfs.tpcds_sf1_parquet_views). Query 66 and the > query profile and the query plan are attached here. > This seems to be a regression, the same query worked fine on 1.10.0 > On Drill 1.10.0 ( git.commit id : bbcf4b76) => 9.026 seconds (completed > successfully). > On Drill 1.14.0 ( git.commit.id.abbrev=da24113 ) query 66 failed after > running for 27 minutes, due to OutOfMemoryException > Stack trace from sqlline console, no stack trace was written to drillbit.log > {noformat} > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Too little memory available > Fragment 2:0 > [Error Id: 5636a939-a318-4b59-b3e8-9eb93f6b82f3 on qa102-45.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) Too little memory > available > org.apache.drill.exec.test.generated.HashAggregatorGen7120.delayedSetup():409 > org.apache.drill.exec.test.generated.HashAggregatorGen7120.doWork():579 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory > while executing the query. > Too little memory available > Fragment 2:0 > [Error Id: 5636a939-a318-4b59-b3e8-9eb93f6b82f3 on qa102-45.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) Too little memory > available > org.apache.drill.exec.test.generated.HashAggregatorGen7120.delayedSetup():409 > org.apache.drill.exec.test.generated.HashAggregatorGen7120.doWork():579 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > > ... > Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE > ERROR: One or more nodes ran out of memory while executing the query. > Too little memory available > Fragment 2:0 > [Error Id: 5636a939-a318-4b59-b3e8-9eb93f6b82f3 on qa102-45.qa.lab:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) Too little memory > available > org.apache.drill.exec.test.generated.HashAggregatorGen7120.delayedSetup():409 > org.apache.drill.exec.test.generated.HashAggregatorGen7120.doWork():579 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():279 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)