[jira] [Updated] (CASSANDRA-3775) rpc_timeout error when reading from a cluster that just had a node die. Only happens if gossip hasn't noticed the dead node yet.

Cathy Daw (Updated) (JIRA) Mon, 23 Jan 2012 17:16:05 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cathy Daw updated CASSANDRA-3775:
---------------------------------

    Description: 
Create a cluster of 3 nodes with RF=3 and CL=QUORUM. Insert some data then kill 
a node (not the coordinator) and immediately try to read the data. The read 
request will fail within about 2 seconds. cassandra.yaml has rpc_timeout=10000. 
A failing test has been written in cassandra-dtest, branch 
"read_when_node_is_down". The test can be run like this: nosetests --nocapture 
read_when_node_down_test.py Here is the error from the test:

{code}
======================================================================
ERROR: read_when_node_down_test.TestReadWhenNodeDown.read_when_node_down_test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/home/tahooie/cassandra-dtest/read_when_node_down_test.py", line 40, in 
read_when_node_down_test
    query_c1c2(cursor, 100, CL)
  File "/home/tahooie/cassandra-dtest/tools.py", line 28, in query_c1c2
    cursor.execute('SELECT c1, c2 FROM cf USING CONSISTENCY %s WHERE key=k%d' % 
(consistency, key))
  File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in 
execute
    raise cql.OperationalError("Request did not complete within rpc_timeout.")
OperationalError: ('Request did not complete within rpc_timeout.', 'reading 
failed in 2.0130 seconds.')
{code}

I did notice that if I sleep 20 seconds after killing the node and before 
reading, that the read succeeds. This is probably because gossip has had a 
chance to notice that the node is down.

  was:
Create a cluster of 3 nodes with RF=3 and CL=QUORUM. Insert some data then kill 
a node (not the coordinator) and immediately try to read the data. The read 
request will fail within about 2 seconds. cassandra.yaml has rpc_timeout=10000. 
A failing test has been written in cassandra-dtest, branch 
"read_when_node_is_down". The test can be run like this: nosetests --nocapture 
read_when_node_down_test.py Here is the error from the test:

======================================================================
ERROR: read_when_node_down_test.TestReadWhenNodeDown.read_when_node_down_test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/home/tahooie/cassandra-dtest/read_when_node_down_test.py", line 40, in 
read_when_node_down_test
    query_c1c2(cursor, 100, CL)
  File "/home/tahooie/cassandra-dtest/tools.py", line 28, in query_c1c2
    cursor.execute('SELECT c1, c2 FROM cf USING CONSISTENCY %s WHERE key=k%d' % 
(consistency, key))
  File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in 
execute
    raise cql.OperationalError("Request did not complete within rpc_timeout.")
OperationalError: ('Request did not complete within rpc_timeout.', 'reading 
failed in 2.0130 seconds.')

I did notice that if I sleep 20 seconds after killing the node and before 
reading, that the read succeeds. This is probably because gossip has had a 
chance to notice that the node is down.

    
> rpc_timeout error when reading from a cluster that just had a node die. Only 
> happens if gossip hasn't noticed the dead node yet.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3775
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3775
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.6
>         Environment: ubuntu. Used ccm to create the cluster.
>            Reporter: Tyler Patterson
>
> Create a cluster of 3 nodes with RF=3 and CL=QUORUM. Insert some data then 
> kill a node (not the coordinator) and immediately try to read the data. The 
> read request will fail within about 2 seconds. cassandra.yaml has 
> rpc_timeout=10000. A failing test has been written in cassandra-dtest, branch 
> "read_when_node_is_down". The test can be run like this: nosetests 
> --nocapture read_when_node_down_test.py Here is the error from the test:
> {code}
> ======================================================================
> ERROR: read_when_node_down_test.TestReadWhenNodeDown.read_when_node_down_test
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
>     self.test(*self.arg)
>   File "/home/tahooie/cassandra-dtest/read_when_node_down_test.py", line 40, 
> in read_when_node_down_test
>     query_c1c2(cursor, 100, CL)
>   File "/home/tahooie/cassandra-dtest/tools.py", line 28, in query_c1c2
>     cursor.execute('SELECT c1, c2 FROM cf USING CONSISTENCY %s WHERE key=k%d' 
> % (consistency, key))
>   File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in 
> execute
>     raise cql.OperationalError("Request did not complete within rpc_timeout.")
> OperationalError: ('Request did not complete within rpc_timeout.', 'reading 
> failed in 2.0130 seconds.')
> {code}
> I did notice that if I sleep 20 seconds after killing the node and before 
> reading, that the read succeeds. This is probably because gossip has had a 
> chance to notice that the node is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3775) rpc_timeout error when reading from a cluster that just had a node die. Only happens if gossip hasn't noticed the dead node yet.

Reply via email to