[jira] [Updated] (CASSANDRA-6608) Cassandra timeout on node failure

Ankit Patel (JIRA) Tue, 21 Jan 2014 10:31:54 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ankit Patel updated CASSANDRA-6608:
-----------------------------------

    Description: 

We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We have 
6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of each 
other. All reads and writes are LOCAL_QOURUM. We see that when one of the node 
in DC1 fails, we see timeout errors on the second node for reads. When we 
turned on DEBUG level logs, we see the following error in the Cassandra logs –

DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) Read 
timeout: java.util.concurrent.TimeoutException: Operation timed out - received 
only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, .

Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in the DC, 
I am surprised we are seeing this issue. The log clearly says it has received 2 
responses. Interestingly, when we connect to the third node after the second 
node returned timeout error, it works as expected. Has anyone else faced this 
issue?


  was:
We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We have 
6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of each 
other. All reads and writes are LOCAL_QOURUM. We see that when one of the node 
in DC1 fails, we see timeout errors. When we turned on DEBUG level logs, we see 
the following error in the Cassandra logs –

DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) Read 
timeout: java.util.concurrent.TimeoutException: Operation timed out - received 
only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, .

Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in the DC, 
I am surprised we are seeing this issue. Interestingly, when we connect to the 
third node after the second node returned timeout error, it works as expected.



> Cassandra timeout on node failure
> ---------------------------------
>
>                 Key: CASSANDRA-6608
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6608
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ankit Patel
>
> We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We 
> have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of 
> each other. All reads and writes are LOCAL_QOURUM. We see that when one of 
> the node in DC1 fails, we see timeout errors on the second node for reads. 
> When we turned on DEBUG level logs, we see the following error in the 
> Cassandra logs –
> DEBUG [Thrift:322] 2013-12-20 14:30:20,123 StorageProxy.java (line 676) Read 
> timeout: java.util.concurrent.TimeoutException: Operation timed out - 
> received only 2 responses from / xxx.xxx.xxx.IP1, xxx.xxx.xxx.IP2, .
> Considering that for LOCAL_QOURUM, we only need 2 nodes out of the 3 in the 
> DC, I am surprised we are seeing this issue. The log clearly says it has 
> received 2 responses. Interestingly, when we connect to the third node after 
> the second node returned timeout error, it works as expected. Has anyone else 
> faced this issue?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (CASSANDRA-6608) Cassandra timeout on node failure

Reply via email to