[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004707#comment-13004707 ]
Aaron Morton edited comment on CASSANDRA-2290 at 3/9/11 6:45 PM: ----------------------------------------------------------------- Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the FileStresmTask to only send one range and close the sending channel. The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo() While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending) java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT) When I was debugging the socketChannel was still reporting it was open. Update: Modified FileStresmTask to call System.exit() after sending the first section and got the same result. was (Author: amorton): Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the FileStresmTask to only send one range and close the sending channel. The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo() While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending) java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT) When I was debugging the socketChannel was still reporting it was open. > Repair hangs if one of the neighbor is dead > ------------------------------------------- > > Key: CASSANDRA-2290 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2290 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.6 > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Priority: Minor > Fix For: 0.7.4 > > Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Repair don't cope well with dead/dying neighbors. There is 2 problems: > # Repair don't check if a node is dead before sending a TreeRequest; this > is easily fixable. > # If a neighbor dies mid-repair, the repair will also hang forever. > The second point is not easy to deal with. The best approach is probably > CASSANDRA-1740 however. That is, if we add a way to query the state of a > repair, and that this query correctly check all neighbors and also add a way > to cancel a repair, this would probably be enough. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira