[ 
https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221380#comment-13221380
 ] 

Simon Wilkinson commented on AVRO-1027:
---------------------------------------

No problem.

I've been thinking about how to construct the unit test for this. As James has 
said, it will be tricky.
Specifically, to force the disconnect, the server has to be closed after a 
client thread has acquired a NettyTransceiver's read lock in transceive(), but 
before the request gets very far in the NettyTransceiver's Netty pipeline.

There doesn't seem any way to make this happen consistently without adding some 
debug synchronization to NettyTransceiver.
                
> NettyTransceiver will deadlock when attempting transceive/disconnect on the 
> same thread
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-1027
>                 URL: https://issues.apache.org/jira/browse/AVRO-1027
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.6.1
>            Reporter: Simon Wilkinson
>            Assignee: Simon Wilkinson
>             Fix For: 1.6.3
>
>         Attachments: AVRO-1027-v2.patch, AVRO-1027.patch
>
>
> If an Exception is caught while trying to write to a Channel, Netty can 
> deliver the Exception to a ChannelUpstreamHandler on the same thread that 
> attempted to write to the Channel. If this occurs with the 
> NettyClientAvroHandler implementation of ChannelUpstreamHandler then the 
> thread will deadlock.
> Specifically, NettyClientAvroHandler overrides the 
> ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, 
> which requires the NettyTransceiver's write lock. However, in the above 
> situation, the thread will already have locked the NettyTransceiver's read 
> lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading 
> from a read to a write lock, hence the thread deadlocks.
> Example stack trace (simplified):
> "SessionManager-TimeoutPoller" prio=10 tid=0x7b689c00 nid=0x375d waiting on 
> condition [0x7b0ad000..0x7b0ade70]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0xf2a944d8> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
>     at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
> >>> [Acquire write lock] at 
> >>> org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285)
>     at 
> org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281)
>     at 
> org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499)
>     at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
>     at 
> org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
>     at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
>     at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>     at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
>     at 
> org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661)
>     at 
> org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372)
>     at 
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771)
>     at org.jboss.netty.channel.Channels.write(Channels.java:632)
>     at 
> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
>     at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
>     at org.jboss.netty.channel.Channels.write(Channels.java:611)
>     at org.jboss.netty.channel.Channels.write(Channels.java:578)
>     at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251)
> >>> [Acquire read lock] at 
> >>> org.apache.avro.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:413)
> >>> [Acquire read lock] at 
> >>> org.apache.avro.ipc.NettyTransceiver.transceive(NettyTransceiver.java:394)
>     at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
>     at org.apache.avro.ipc.Requestor.request(Requestor.java:129)
>     at 
> org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:68)
>     <snip>
> Note, in Avro 1.6.1 the read lock is acquired in both 
> NettyTransceiver.transceive() and NettyTransceiver.writeDataPack(). AVRO-1013 
> fixes this so that it is acquired only once in NettyTransceiver.transceive().
> I've attached a patch that demonstrates a potential fix for the deadlock; the 
> patch assumes that AVRO-1013 has also been applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to