[ https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946153#comment-15946153 ]
Paul Rogers commented on DRILL-4301: ------------------------------------ Much work was done in the managed version of the external sort to tightly control memory and when to spill. Changed resolution version to 1.11 along with other managed sort fixes. Please, if the example in the bug is wrong, find a correct example, test with the "old" sort to reproduce, then test with the managed version to verify a fix. It may be possible to set up the "not enough batch groups to spill" issue under very peculiar circumstances; but the new code attempts to handle all of them. If all else fails, the new code puts a warning into the log file that an OOM is likely to occur if the sort is asked to sort data when there is insufficient memory to hold two incoming batches (which is the scenario defined here.) Anything else you want to see? > OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to > spill. > ----------------------------------------------------------------------------------- > > Key: DRILL-4301 > URL: https://issues.apache.org/jira/browse/DRILL-4301 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow > Affects Versions: 1.5.0 > Environment: 4 node cluster > Reporter: Khurram Faraaz > Assignee: Paul Rogers > Fix For: 1.11.0 > > > Query below in Functional tests, fails due to OOM > {code} > select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` > where bool_col = true; > {code} > Drill version : drill-1.5.0 > JAVA_VERSION=1.8.0 > {noformat} > version commit_id commit_message commit_time build_email > build_time > 1.5.0-SNAPSHOT 2f0e3f27e630d5ac15cdaef808564e01708c3c55 > DRILL-4190 Don't hold on to batches from left side of merge join. > 20.01.2016 @ 22:30:26 UTC Unknown 20.01.2016 @ 23:48:33 UTC > framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q > (connection: 808078113) > [#1378] Query failed: > oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: > One or more nodes ran out of memory while executing the query. > Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill. > batchGroups.size 0 > spilledBatchGroups.size 0 > allocated memory 48326272 > allocator limit 46684427 > Fragment 0:0 > [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010] > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > at > oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > at > oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) > at > oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) > at > oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) > at > oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) > at > oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)