[ https://issues.apache.org/jira/browse/CASSANDRA-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634649#comment-13634649 ]
Jason Brown commented on CASSANDRA-5393: ---------------------------------------- At the end of the day, this is what I see happening: {code}INFO [AntiEntropyStage:1] 2013-03-27 22:48:55,390 AntiEntropyService.java (line 239) repair #80fe25a0-9730-11e2-0000-ebe7011631ff Sending completed merkle tree to /54.246.XXX.YYY for (Geo,GeoCountryMetadata) DEBUG [WRITE-/54.246.XXX.YYY] 2013-03-27 22:48:55,392 OutboundTcpConnection.java (line 165) error writing to ec2-54-246-XXX.YYY.eu-west-1.compute.amazonaws.com/54.246.XXX.YYY java.net.SocketException: Connection timed out at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:358) at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:346) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:781) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:753) at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:100) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:104) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.cassandra.net.OutboundTcpConnection.write(OutboundTcpConnection.java:200) at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:152) at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:126) {code} The interesting thing is the "Connection timed out" exception message, rather than socket reset (or something similar). So, I'm thinking this might be to keepalive timing out after the connection is broken. I was able to reproduce this exception several times by having my test cluster setup in three ec2 regions (us-west-2, us-east-1, eu-west-1 - three nodes in each), and not sending any traffic for multiple hours. Basically, I'm waiting for the connection to get dropped. Thus, when I went to triggered repair on one of the nodes (usu. starting with us-west-2), I could see where the eu-west-1 nodes would get the request to build the merkle tree, but then failed on sending the tree response with the above exception. I was able to get similar problems when trying a schema update after many hours of cluster idleness. The attached patch catches the exception when the socket is dead (for whatever reason), and attempts a simple retry by requeueing the message at the end of the backlog queue, with the hope that the next pass will successfully recreate the socket. Note that I'm excluding MessagingService.DROPPABLE_VERBS from retries as it's OK to drop reads/mutates, but it's really those AES and other schema-related messages that I think we'd want to retry. Admittedly this is a simple mechanism that doesn't try to do anything fancy like exponential backoff, n-levels of configurable retrys, and so on. I'm open to discussion on that, but I'm not sure how much complexity we'd want to build in for that at this point. I think an incremental improvement would go a long way here as we're currently obscuring when messages can't be sent (which is OK for DROPPABLE_VERBS, but those other ones are ones are really important), so added visibility and a retry mechanism will help. > Add an Ack/Retry for merkle tree sending > ---------------------------------------- > > Key: CASSANDRA-5393 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5393 > Project: Cassandra > Issue Type: Bug > Reporter: Jeremiah Jordan > Assignee: Jason Brown > Attachments: 5393.patch > > > Can we add an Ack/Retry around passing merle tree's around in repair? If the > following fails, the repair hangs for ever on the coordinating node. > https://github.com/apache/cassandra/blob/cassandra-1.1.10/src/java/org/apache/cassandra/service/AntiEntropyService.java#L242 > {noformat} > Message message = TreeResponseVerbHandler.makeVerb(local, > validator); > if > (!validator.request.endpoint.equals(FBUtilities.getBroadcastAddress())) > logger.info(String.format("[repair #%s] Sending completed > merkle tree to %s for %s", validator.request.sessionid, > validator.request.endpoint, validator.request.cf)); > ms.sendOneWay(message, validator.request.endpoint); > {noformat} > If the message asking for merkle tree's gets lost, coordinating node hangs > for ever as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira