[ 
https://issues.apache.org/jira/browse/CASSANDRA-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634649#comment-13634649
 ] 

Jason Brown commented on CASSANDRA-5393:
----------------------------------------

At the end of the day, this is what I see happening:

{code}INFO [AntiEntropyStage:1] 2013-03-27 22:48:55,390 AntiEntropyService.java 
(line 239) repair #80fe25a0-9730-11e2-0000-ebe7011631ff Sending completed 
merkle tree to /54.246.XXX.YYY for (Geo,GeoCountryMetadata)
DEBUG [WRITE-/54.246.XXX.YYY] 2013-03-27 22:48:55,392 
OutboundTcpConnection.java (line 165) error writing to 
ec2-54-246-XXX.YYY.eu-west-1.compute.amazonaws.com/54.246.XXX.YYY
java.net.SocketException: Connection timed out
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:358)
at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:346)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:781)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:753)
at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:100)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:104)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
org.apache.cassandra.net.OutboundTcpConnection.write(OutboundTcpConnection.java:200)
at 
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:152)
at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:126)
{code}

The interesting thing is the "Connection timed out" exception message, rather 
than socket reset (or something similar). So, I'm thinking this might be to 
keepalive timing out after the connection is broken. I was able to reproduce 
this exception several times by having my test cluster setup in three ec2 
regions (us-west-2, us-east-1, eu-west-1 - three nodes in each), and not 
sending any traffic for multiple hours. Basically, I'm waiting for the 
connection to get dropped. Thus, when I went to triggered repair on one of the 
nodes (usu. starting with us-west-2), I could see where the eu-west-1 nodes 
would get the request to build the merkle tree, but then failed on sending the 
tree response with the above exception. I was able to get similar problems when 
trying a schema update after many hours of cluster idleness.

The attached patch catches the exception when the socket is dead (for whatever 
reason), and attempts a simple retry by requeueing the message at the end of 
the backlog queue, with the hope that the next pass will successfully recreate 
the socket. Note that I'm excluding MessagingService.DROPPABLE_VERBS from 
retries as it's OK to drop reads/mutates, but it's really those AES and other 
schema-related messages that I think we'd want to retry.

Admittedly this is a simple mechanism that doesn't try to do anything fancy 
like exponential backoff, n-levels of configurable retrys, and so on. I'm open 
to discussion on that, but I'm not sure how much complexity we'd want to build 
in for that at this point. I think an incremental improvement would go a long 
way here as we're currently obscuring when messages can't be sent (which is OK 
for DROPPABLE_VERBS, but those other ones are ones are really important), so 
added visibility and a retry mechanism will help. 


 
                
> Add an Ack/Retry for merkle tree sending
> ----------------------------------------
>
>                 Key: CASSANDRA-5393
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5393
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Jason Brown
>         Attachments: 5393.patch
>
>
> Can we add an Ack/Retry around passing merle tree's around in repair?  If the 
> following fails, the repair hangs for ever on the coordinating node.
> https://github.com/apache/cassandra/blob/cassandra-1.1.10/src/java/org/apache/cassandra/service/AntiEntropyService.java#L242
> {noformat}
>             Message message = TreeResponseVerbHandler.makeVerb(local, 
> validator);
>             if 
> (!validator.request.endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 logger.info(String.format("[repair #%s] Sending completed 
> merkle tree to %s for %s", validator.request.sessionid, 
> validator.request.endpoint, validator.request.cf));
>             ms.sendOneWay(message, validator.request.endpoint);
> {noformat}
> If the message asking for merkle tree's gets lost, coordinating node hangs 
> for ever as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to