[ https://issues.apache.org/jira/browse/DRILL-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166143#comment-16166143 ]
ASF GitHub Bot commented on DRILL-5749: --------------------------------------- GitHub user weijietong opened a pull request: https://github.com/apache/drill/pull/943 DRILL-5749: solve deadlock between foreman and netty threads @paul-rogers please review this PR again ,fail to squash the commits at last PR, sorry about that. related thread stack, please see [DRILL-5749](https://issues.apache.org/jira/browse/DRILL-5749). process is to break the nested condition invoke . You can merge this pull request into a Git repository by running: $ git pull https://github.com/weijietong/drill drill-5749 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/943.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #943 ---- commit b44f780a948c4a0898e7cee042c0590f0713f780 Author: weijietong <tongweijie...@gmail.com> Date: 2017-06-08T08:03:46Z Merge pull request #1 from apache/master sync commit d045c757c80a759b435479cc89f33c749fc16ac2 Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-08-11T08:01:36Z Merge branch 'master' of github.com:weijietong/drill commit 08b7006f4c70c45a17ebf7eae6beaa2bdb0d0454 Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-08-20T12:05:51Z update commit 9e9ebb497a183e61a72665019e6e04070d912027 Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-08-20T12:07:41Z revert commit 837d9fc58440fb584690f93b5f638ddcedf042a1 Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-08-22T10:35:12Z Merge branch 'master' of github.com:apache/drill commit b1fc840ad9d0a9959b05a84bfd17f17067def32d Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-08-29T16:39:48Z Merge branch 'master' of github.com:apache/drill commit 52d7a0b795cf2ef29c596e84277cc01f1c105d19 Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-09-14T11:55:26Z Merge branch 'master' of github.com:apache/drill commit 2fbc23998ff5c8cb8a2a476221be856d69a559c4 Author: weijie.tong <weijie.t...@alipay.com> Date: 2017-09-14T12:02:55Z solve deadlock occured between foreman and netty threads ---- > Foreman and Netty threads occure deadlock > ------------------------------------------ > > Key: DRILL-5749 > URL: https://issues.apache.org/jira/browse/DRILL-5749 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC > Affects Versions: 1.10.0, 1.11.0 > Reporter: weijie.tong > Priority: Critical > > when the cluster was in high concurrency query and the reused control > connection occured exceptoin, the foreman and netty threads both try to > acquire each other's lock then deadlock occured. The netty thread hold the > map (RequestIdMap) lock then try to acquire the ReconnectingConnection lock > to send command, while the foreman thread hold the ReconnectingConnection > lock then try to acquire the RequestIdMap lock. So the deadlock happend. > Below is the jstack dump: > Found one Java-level deadlock: > ============================= > "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman": > waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a > org.apache.drill.exec.rpc.control.ControlConnectionManager), > which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman" > "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman": > waiting to lock monitor 0x00007f90de3b9648 (object 0x00000006b524d7e8, a > com.carrotsearch.hppc.IntObjectHashMap), > which is held by "BitServer-2" > "BitServer-2": > waiting to lock monitor 0x00007f935b721f48 (object 0x0000000656affc40, a > org.apache.drill.exec.rpc.control.ControlConnectionManager), > which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman" > Java stack information for the threads listed above: > =================================================== > "265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman": > at > org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72) > - waiting to lock <0x0000000656affc40> (a > org.apache.drill.exec.rpc.control.ControlConnectionManager) > at > org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66) > at > org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210) > at > org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141) > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454) > at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045) > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:849) > "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman": > at > org.apache.drill.exec.rpc.RequestIdMap.createNewRpcListener(RequestIdMap.java:87) > - waiting to lock <0x00000006b524d7e8> (a > com.carrotsearch.hppc.IntObjectHashMap) > at > org.apache.drill.exec.rpc.AbstractRemoteConnection.createNewRpcListener(AbstractRemoteConnection.java:153) > at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:115) > at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:89) > at > org.apache.drill.exec.rpc.control.ControlConnection.send(ControlConnection.java:65) > at > org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:160) > at > org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:150) > at > org.apache.drill.exec.rpc.ListeningCommand.connectionAvailable(ListeningCommand.java:38) > at > org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:75) > - locked <0x0000000656affc40> (a > org.apache.drill.exec.rpc.control.ControlConnectionManager) > at > org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66) > at > org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210) > at > org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141) > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454) > at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045) > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:849) > "BitServer-2": > at > org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72) > - waiting to lock <0x0000000656affc40> (a > org.apache.drill.exec.rpc.control.ControlConnectionManager) > at > org.apache.drill.exec.rpc.control.ControlTunnel.cancelFragment(ControlTunnel.java:71) > at > org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:220) > at > org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:968) > at > org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:109) > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1020) > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1013) > at > org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) > at > org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1015) > at > org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1033) > at > org.apache.drill.exec.work.foreman.Foreman$FragmentSubmitListener.failed(Foreman.java:1274) > at > org.apache.drill.exec.rpc.ListeningCommand$DeferredRpcOutcome.failed(ListeningCommand.java:50) > at > org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134) > at > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74) > at > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64) > at > com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692) > at > org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58) > - locked <0x00000006b524d7e8> (a com.carrotsearch.hppc.IntObjectHashMap) > at > org.apache.drill.exec.rpc.AbstractRemoteConnection.channelClosed(AbstractRemoteConnection.java:183) > at > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:165) > at > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:142) > at > org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:204) > at > org.apache.drill.exec.rpc.ReconnectingConnection$CloseHandler.operationComplete(ReconnectingConnection.java:191) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) > at > io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) > at > io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099) > at > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) > at > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) > at > io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) > at > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) > at > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) > at > io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73) > at > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) > at > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) > at > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466) > at > org.apache.drill.exec.rpc.RpcExceptionHandler.exceptionCaught(RpcExceptionHandler.java:39) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) -- This message was sent by Atlassian JIRA (v6.4.14#64029)