[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222805#comment-15222805
 ] 

ASF GitHub Bot commented on DRILL-3714:
---------------------------------------

Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/442#discussion_r58290979
  
    --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RpcBus.java ---
    @@ -159,19 +159,15 @@ public ChannelClosedHandler(C clientConnection, 
Channel channel) {
         @Override
         public void operationComplete(ChannelFuture future) throws Exception {
           String msg;
    -      if(local!=null) {
    +      if(local != null) {
             msg = String.format("Channel closed %s <--> %s.", local, remote);
           }else{
             msg = String.format("Channel closed %s <--> %s.", 
future.channel().localAddress(), future.channel().remoteAddress());
           }
     
    -      if (RpcBus.this.isClient()) {
    -        if(local != null) {
    -          logger.info(String.format(msg));
    -        }
    -      } else {
    -        queue.channelClosed(new ChannelClosedException(msg));
    -      }
    +      logger.info(msg); // should we leave this at info level ?
    +
    +      queue.channelClosed(new ChannelClosedException(msg));
    --- End diff --
    
    @sudheeshkatkam this is also true for the user tunnel. We have a single 
UserServer on each drillbit that has connections with multiple clients, if one 
connection is closed, the queue will fail all batches waiting for outcome. See 
[DRILL-3763](https://issues.apache.org/jira/browse/DRILL-3763)
    
    @jacques-n let me wrap my head around your suggestion. In the meantime, I 
may have a simple fix that would resolve the issue in the short-term: we just 
need to pass the closed channel to the coordination queue and only fail the 
listeners related to the closed channel.


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> ------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3714
>                 URL: https://issues.apache.org/jira/browse/DRILL-3714
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Jacques Nadeau
>            Priority: Critical
>             Fix For: 1.7.0
>
>         Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
>         Single node drillbit installation.
>         DRILL_MAX_DIRECT_MEMORY="8G"
>         DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>         at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) 
> ~[na:1.7.0_71]
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
> ~[na:1.7.0_71]
>         at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600) 
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:4.0.27.Final]
>         at 
> io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         ... 11 common frames omitted
> 2015-08-26 16:56:31,087 [BitServer-5] INFO  
> o.a.d.exec.rpc.ProtobufLengthDecoder - Channel is closed, discarding 
> remaining 124958 byte(s) in buffer.
> 2015-08-26 16:56:31,087 [BitClient-1] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:54554 <--> /10.10.88.133:31012 (data client).  
> Closing connection.
> java.io.IOException: syscall:read(...)() failed: Connection reset by peer
> 2015-08-26 16:56:31,088 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.133:54554 <--> 
> /10.10.88.133:31012.
> 2015-08-26 16:56:35,325 [BitServer-6] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54555 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> {code}
> Attached:
>         query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>         drillbit.log
>         jstack.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to