[ 
https://issues.apache.org/jira/browse/DRILL-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160419#comment-16160419
 ] 

ASF GitHub Bot commented on DRILL-5749:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/925#discussion_r137955415
  
    --- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
    @@ -54,10 +52,14 @@ void channelClosed(Throwable ex) {
         isOpen.set(false);
         if (ex != null) {
           final RpcException e = RpcException.mapException(ex);
    +      IntObjectHashMap<RpcOutcome<?>> clonedMap;
           synchronized (map) {
    -        map.forEach(new SetExceptionProcedure(e));
    +        clonedMap = map.clone();
             map.clear();
           }
    +      if (clonedMap != null) {
    --- End diff --
    
    Please do. The if statement is a message to readers that clonedMap could be 
null and so we must try to sleuth out the conditions under which that occurs. 
Otherwise, I just go ahead and assume that Java is deterministic and that, once 
we set variable x in a single-threaded environment, it stays at that value 
until we change it...


> Foreman and Netty threads occure deadlock 
> ------------------------------------------
>
>                 Key: DRILL-5749
>                 URL: https://issues.apache.org/jira/browse/DRILL-5749
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - RPC
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: weijie.tong
>            Priority: Critical
>
> when the cluster was in high concurrency query and the reused control 
> connection occured exceptoin, the foreman and netty threads both try to 
> acquire each other's lock then deadlock occured.  The netty thread hold the 
> map (RequestIdMap) lock then try to acquire the ReconnectingConnection lock 
> to send command, while the foreman thread hold the ReconnectingConnection 
> lock then try to acquire the RequestIdMap lock. So the deadlock happend.
> Below is the jstack dump:
> Found one Java-level deadlock:
> =============================
> "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
>   waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a 
> org.apache.drill.exec.rpc.control.ControlConnectionManager),
>   which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
> "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
>   waiting to lock monitor 0x00007f90de3b9648 (object 0x00000006b524d7e8, a 
> com.carrotsearch.hppc.IntObjectHashMap),
>   which is held by "BitServer-2"
> "BitServer-2":
>   waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a 
> org.apache.drill.exec.rpc.control.ControlConnectionManager),
>   which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
> Java stack information for the threads listed above:
> ===================================================
> "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
>       at 
> org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
>       - waiting to lock <0x0000000656affc40> (a 
> org.apache.drill.exec.rpc.control.ControlConnectionManager)
>       at 
> org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
>       at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>       at java.lang.Thread.run(Thread.java:849)
> "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
>       at 
> org.apache.drill.exec.rpc.RequestIdMap.createNewRpcListener(RequestIdMap.java:87)
>       - waiting to lock <0x00000006b524d7e8> (a 
> com.carrotsearch.hppc.IntObjectHashMap)
>       at 
> org.apache.drill.exec.rpc.AbstractRemoteConnection.createNewRpcListener(AbstractRemoteConnection.java:153)
>       at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:115)
>       at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:89)
>       at 
> org.apache.drill.exec.rpc.control.ControlConnection.send(ControlConnection.java:65)
>       at 
> org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:160)
>       at 
> org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:150)
>       at 
> org.apache.drill.exec.rpc.ListeningCommand.connectionAvailable(ListeningCommand.java:38)
>       at 
> org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:75)
>       - locked <0x0000000656affc40> (a 
> org.apache.drill.exec.rpc.control.ControlConnectionManager)
>       at 
> org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
>       at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>       at java.lang.Thread.run(Thread.java:849)
> "BitServer-2":
>       at 
> org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
>       - waiting to lock <0x0000000656affc40> (a 
> org.apache.drill.exec.rpc.control.ControlConnectionManager)
>       at 
> org.apache.drill.exec.rpc.control.ControlTunnel.cancelFragment(ControlTunnel.java:71)
>       at 
> org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:220)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:968)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:109)
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1020)
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1013)
>       at 
> org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
>       at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1015)
>       at 
> org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1033)
>       at 
> org.apache.drill.exec.work.foreman.Foreman$FragmentSubmitListener.failed(Foreman.java:1274)
>       at 
> org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.failed(ListeningCommand.java:50)
>       at 
> org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134)
>       at 
> org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74)
>       at 
> org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64)
>       at 
> com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
>       at 
> org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58)
>       - locked <0x00000006b524d7e8> (a com.carrotsearch.hppc.IntObjectHashMap)
>       at 
> org.apache.drill.exec.rpc.AbstractRemoteConnection.channelClosed(AbstractRemoteConnection.java:183)
>       at 
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:165)
>       at 
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:142)
>       at 
> org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:204)
>       at 
> org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:191)
>       at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>       at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>       at 
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
>       at 
> io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
>       at 
> io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
>       at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
>       at 
> io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
>       at 
> io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
>       at 
> org.apache.drill.exec.rpc.RpcExceptionHandler.exceptionCaught(RpcExceptionHandler.java:39)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
>       at 
> io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
>       at 
> io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
>       at 
> io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to