[jira] [Commented] (CASSANDRA-8352) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239271#comment-14239271 ] Amit Singh Chowdhery commented on CASSANDRA-8352: - We have upgraded to Cassandra 2.0.11 and yet are facing the same trouble. Gist:- We have two 3 node clusters in two different DCs and if one or more of the nodes go down in one Data Center , ~5-10% traffic failure is observed on the other. CL: LOCAL_QUORUM RF=3 > Timeout Exception on Node Failure in Remote Data Center > --- > > Key: CASSANDRA-8352 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8352 > Project: Cassandra > Issue Type: Bug > Environment: Unix, Cassandra 2.0.3 >Reporter: Akhtar Hussain > Labels: DataCenter, GEO-Red > > We have a Geo-red setup with 2 Data centers having 3 nodes each. When we > bring down a single Cassandra node down in DC2 by kill -9 , > reads fail on DC1 with TimedOutException for a brief amount of time (15-20 > sec~). > Questions: > 1.We need to understand why reads fail on DC1 when a node in another DC > i.e. DC2 fails? As we are using LOCAL_QUORUM for both reads/writes in DC1, > request should return once 2 nodes in local DC have replied instead of timing > out because of node in remote DC. > 2.We want to make sure that no Cassandra requests fail in case of node > failures. We used rapid read protection of ALWAYS/99percentile/10ms as > mentioned in > http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2. > But nothing worked. How to ensure zero request failures in case a node fails? > 3.What is the right way of handling HTimedOutException exceptions in > Hector? > 4.Please confirm are we using public private hostnames as expected? > We are using Cassandra 2.0.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8352) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225828#comment-14225828 ] Akhtar Hussain commented on CASSANDRA-8352: --- Fine we will test it on Cassandra version 2.0.11 and will share the results soon. :) > Timeout Exception on Node Failure in Remote Data Center > --- > > Key: CASSANDRA-8352 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8352 > Project: Cassandra > Issue Type: Bug > Environment: Unix, Cassandra 2.0.3 >Reporter: Akhtar Hussain > Labels: DataCenter, GEO-Red > > We have a Geo-red setup with 2 Data centers having 3 nodes each. When we > bring down a single Cassandra node down in DC2 by kill -9 , > reads fail on DC1 with TimedOutException for a brief amount of time (15-20 > sec~). > Questions: > 1.We need to understand why reads fail on DC1 when a node in another DC > i.e. DC2 fails? As we are using LOCAL_QUORUM for both reads/writes in DC1, > request should return once 2 nodes in local DC have replied instead of timing > out because of node in remote DC. > 2.We want to make sure that no Cassandra requests fail in case of node > failures. We used rapid read protection of ALWAYS/99percentile/10ms as > mentioned in > http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2. > But nothing worked. How to ensure zero request failures in case a node fails? > 3.What is the right way of handling HTimedOutException exceptions in > Hector? > 4.Please confirm are we using public private hostnames as expected? > We are using Cassandra 2.0.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8352) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225819#comment-14225819 ] Jonathan Ellis commented on CASSANDRA-8352: --- Here's how this works: You test the new version to make sure it's something we haven't fixed already. Then we write a fix for the next new version. Please don't reopen until you've done that. > Timeout Exception on Node Failure in Remote Data Center > --- > > Key: CASSANDRA-8352 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8352 > Project: Cassandra > Issue Type: Bug > Environment: Unix, Cassandra 2.0.3 >Reporter: Akhtar Hussain > Labels: DataCenter, GEO-Red > > We have a Geo-red setup with 2 Data centers having 3 nodes each. When we > bring down a single Cassandra node down in DC2 by kill -9 , > reads fail on DC1 with TimedOutException for a brief amount of time (15-20 > sec~). > Questions: > 1.We need to understand why reads fail on DC1 when a node in another DC > i.e. DC2 fails? As we are using LOCAL_QUORUM for both reads/writes in DC1, > request should return once 2 nodes in local DC have replied instead of timing > out because of node in remote DC. > 2.We want to make sure that no Cassandra requests fail in case of node > failures. We used rapid read protection of ALWAYS/99percentile/10ms as > mentioned in > http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2. > But nothing worked. How to ensure zero request failures in case a node fails? > 3.What is the right way of handling HTimedOutException exceptions in > Hector? > 4.Please confirm are we using public private hostnames as expected? > We are using Cassandra 2.0.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)