[ 
https://issues.apache.org/jira/browse/CASSANDRA-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915386#comment-13915386
 ] 

Ananthkumar K S commented on CASSANDRA-6772:
--------------------------------------------

[~brandon.williams] Agreed. It that's so, can someone let me know if it's a 
bug. Moreover, a firewall problem won't let the TCP connections happen between 
the two nodes. But here, as I mentioned, cassandra was retrying at the network 
layer and it was visible in netstat in both the server. We cannot replicate 
such a scenario as we have 60 other applications running on the same private 
link. So, as an use case, it should be a normal scenario in cassandra in detect 
and establish the connection once the connection comes up. When I reported a 
similar kind of a problem , an infinite loop was introduced to nullify these 
kind I race conditions. But it doesn't solve the problem but creates more load 
on TCP. Can you please review that part for such a scenario?

> Cassandra inter data center communication broken
> ------------------------------------------------
>
>                 Key: CASSANDRA-6772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6772
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: CentOS 6.0
>            Reporter: Ananthkumar K S
>            Priority: Blocker
>
> I have two data enters DC1 and DC2. Both communicate via a private link. 
> Yesterday, we had a problem with a private link for 10 mins. From the time 
> the problem was resolved, nodes in both data centers are not able to 
> communicate with each other. When I do a nodetool status on a node in DC1, 
> the nodes in DC2 are stated as down. When tried in DC2, nodes in DC1 are 
> shown as down .
> But in the cassandra logs, we can clearly see that handshaking is failing 
> every 5 seconds for communication between data centres. At TCP level, there 
> are too many fin_wait1 generated by cassandra which is still a puzzle . 
> Closed_wait top transitions due to this is very high. Due to this kind of 
> problem of TCP listen drops, we moved from 2.0.1 to 2.0.3. In 2.0.1, it was 
> within data center itself. But here it's between data centers. If it has 
> anything to do with the snitch configuration, I am using 
> GossipingPropertyFileSnitch.
> This clearly started happening post private link failure. Any idea on this?
> Cassandra version used is 2.0.3



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to