[ https://issues.apache.org/jira/browse/CASSANDRA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stu Hood updated CASSANDRA-1019: -------------------------------- Attachment: 1019-for-0.6-0001-Add-exponentially-backed-off-retry-to-FileStreamTask.patch Adds retries with exponential backoff to the connect step in FileStreamTask. > "java.net.ConnectException: Connection timed out" in MESSAGE-STREAMING-POOL:1 > ----------------------------------------------------------------------------- > > Key: CASSANDRA-1019 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1019 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.6 > Reporter: B. Todd Burruss > Assignee: Stu Hood > Fix For: 0.6.3 > > Attachments: > 1019-for-0.6-0001-Add-exponentially-backed-off-retry-to-FileStreamTask.patch, > 1019-for-trunk-0001-Rename-CompletedFileStatus-to-FileStatus-to-indicate.patch, > > 1019-for-trunk-0002-Rename-StreamCompletionHandler-to-FileStatusHandler-.patch, > > 1019-for-trunk-0003-Rename-StreamCompletionAction-to-Action-and-change-d.patch > > > after doing a nodetool repair on a node in my cluster, i see the following > exception on 4 out of the 7 nodes. replication factor is 3. no compactions > happening. no client traffic to the cluster. nodetool streams (on one of > the nodes not repaired) shows the following which is not ever increasing: > Mode: Normal > Streaming to: /192.168.132.117 > /data/cassandra-data/data/UdsProfiles/stream/UdsProfiles-43-Data.db > 0/523088443 > Not receiving any streams. > in addition those same four nodes all show AE-SERVICE-STAGE with pending > work, and been showing this for several hours now. each node in the > cluster has less than 2gb, so it should be finished by now. > here is the exception: > 2010-04-23 10:08:43,416 ERROR [MESSAGE-STREAMING-POOL:1] > [DebuggableThreadPoolExecutor.java:101] Error in ThreadPoolExecutor > java.lang.RuntimeException: java.net.ConnectException: Connection timed out > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.net.ConnectException: Connection timed out > at sun.nio.ch.Net.connect(Native Method) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) > at org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > 2010-04-23 10:08:43,417 ERROR [MESSAGE-STREAMING-POOL:1] > [CassandraDaemon.java:78] Fatal exception in thread > Thread[MESSAGE-STREAMING-POOL:1,5,main] > java.lang.RuntimeException: java.net.ConnectException: Connection timed out > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.net.ConnectException: Connection timed out > at sun.nio.ch.Net.connect(Native Method) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) > at org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.