[ https://issues.apache.org/jira/browse/CASSANDRA-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185796#comment-14185796 ]
Vijay commented on CASSANDRA-8188: ---------------------------------- I had the same solution as a part of https://issues.apache.org/jira/secure/attachment/12623900/0001-CASSANDRA-6590.patch, but [~brandon.williams] was seeing some wiredness. I was not able to replicate the that though. > don't block SocketThread for MessagingService > --------------------------------------------- > > Key: CASSANDRA-8188 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8188 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: yangwei > Assignee: yangwei > Attachments: 0001-don-t-block-SocketThread-for-MessagingService.patch > > > We have two datacenters A and B. > The node in A cannot handshake version with nodes in B, logs in A as follow: > {noformat} > INFO [HANDSHAKE-/B] 2014-10-24 04:29:49,075 OutboundTcpConnection.java > (line 395) Cannot handshake version with B > TRACE [WRITE-/B] 2014-10-24 11:02:49,044 OutboundTcpConnection.java (line > 368) unable to connect to /B > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:364) > at sun.nio.ch.Net.connect(Net.java:356) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623) > at java.nio.channels.SocketChannel.open(SocketChannel.java:184) > at > org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:134) > at > org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:119) > at > org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:299) > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:150) > {noformat} > > The jstack output of nodes in B shows it blocks in inputStream.readInt > resulting in SocketThread not accept socket any more, logs as follow: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > - locked <0x00000007963747e8> (a java.lang.Object) > at > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:203) > - locked <0x0000000796374848> (a java.lang.Object) > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) > - locked <0x00000007a5c7ca88> (a > sun.nio.ch.SocketAdaptor$SocketInputStream) > at java.io.InputStream.read(InputStream.java:101) > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) > - locked <0x00000007a5c7ca88> (a > sun.nio.ch.SocketAdaptor$SocketInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:879) > {noformat} > > In nodes of B tcpdump shows retransmission of SYN,ACK during the tcp > three-way handshake phase because tcp implementation drops the last ack when > the backlog queue is full. > In nodes of B ss -tl shows "Recv-Q 51 Send-Q 50". > > In nodes of B netstat -s shows “SYNs to LISTEN sockets dropped” and “times > the listen queue of a socket overflowed” are both increasing. > This patch sets read timeout to 2 * > OutboundTcpConnection.WAIT_FOR_VERSION_MAX_TIME for the accepted socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)