[ https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157394#comment-13157394 ]
Jonathan Ellis commented on CASSANDRA-3533: ------------------------------------------- I'd be curious if any of the other Dynamo-derived systems (Voldemort, Riak, ?) attempt to deal with this. It's not clear to me how we should try to handle incomplete network graphs (A can talk to B and to C, but C can't talk to B). > TimeoutException when there is a firewall issue. > ------------------------------------------------ > > Key: CASSANDRA-3533 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3533 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 1.0.4 > Reporter: Vijay > Priority: Minor > > When one node in the cluster is not able to talk to the other DC/RAC due to > firewall or network related issue (StorageProxy calls fail), and the nodes > are NOT marked down because at least one node in the cluster can talk to the > other DC/RAC, we get timeoutException instead of throwing a > unavailableException. > The problem with this: > 1) It is hard to monitor/identify these errors. > 2) It is hard to diffrentiate from the client if the node being bad vs a bad > query. > 3) when this issue happens we have to wait for at-least the RPC timeout time > to know that the query wont succeed. > Possible Solution: when marking a node down we might want to check if the > node is actually alive by trying to communicate to it? So we can be sure that > the node is actually alive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira