[ 
https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157394#comment-13157394
 ] 

Jonathan Ellis commented on CASSANDRA-3533:
-------------------------------------------

I'd be curious if any of the other Dynamo-derived systems (Voldemort, Riak, ?) 
attempt to deal with this.  It's not clear to me how we should try to handle 
incomplete network graphs (A can talk to B and to C, but C can't talk to B).
                
> TimeoutException when there is a firewall issue.
> ------------------------------------------------
>
>                 Key: CASSANDRA-3533
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3533
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.4
>            Reporter: Vijay
>            Priority: Minor
>
> When one node in the cluster is not able to talk to the other DC/RAC due to 
> firewall or network related issue (StorageProxy calls fail), and the nodes 
> are NOT marked down because at least one node in the cluster can talk to the 
> other DC/RAC, we get timeoutException instead of throwing a 
> unavailableException.
> The problem with this:
> 1) It is hard to monitor/identify these errors.
> 2) It is hard to diffrentiate from the client if the node being bad vs a bad 
> query.
> 3) when this issue happens we have to wait for at-least the RPC timeout time 
> to know that the query wont succeed.
> Possible Solution: when marking a node down we might want to check if the 
> node is actually alive by trying to communicate to it? So we can be sure that 
> the node is actually alive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to