[ https://issues.apache.org/jira/browse/CASSANDRA-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cathy Daw updated CASSANDRA-3775: --------------------------------- Description: Create a cluster of 3 nodes with RF=3 and CL=QUORUM. Insert some data then kill a node (not the coordinator) and immediately try to read the data. The read request will fail within about 2 seconds. cassandra.yaml has rpc_timeout=10000. A failing test has been written in cassandra-dtest, branch "read_when_node_is_down". The test can be run like this: nosetests --nocapture read_when_node_down_test.py Here is the error from the test: {code} ====================================================================== ERROR: read_when_node_down_test.TestReadWhenNodeDown.read_when_node_down_test ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest self.test(*self.arg) File "/home/tahooie/cassandra-dtest/read_when_node_down_test.py", line 40, in read_when_node_down_test query_c1c2(cursor, 100, CL) File "/home/tahooie/cassandra-dtest/tools.py", line 28, in query_c1c2 cursor.execute('SELECT c1, c2 FROM cf USING CONSISTENCY %s WHERE key=k%d' % (consistency, key)) File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute raise cql.OperationalError("Request did not complete within rpc_timeout.") OperationalError: ('Request did not complete within rpc_timeout.', 'reading failed in 2.0130 seconds.') {code} I did notice that if I sleep 20 seconds after killing the node and before reading, that the read succeeds. This is probably because gossip has had a chance to notice that the node is down. was: Create a cluster of 3 nodes with RF=3 and CL=QUORUM. Insert some data then kill a node (not the coordinator) and immediately try to read the data. The read request will fail within about 2 seconds. cassandra.yaml has rpc_timeout=10000. A failing test has been written in cassandra-dtest, branch "read_when_node_is_down". The test can be run like this: nosetests --nocapture read_when_node_down_test.py Here is the error from the test: ====================================================================== ERROR: read_when_node_down_test.TestReadWhenNodeDown.read_when_node_down_test ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest self.test(*self.arg) File "/home/tahooie/cassandra-dtest/read_when_node_down_test.py", line 40, in read_when_node_down_test query_c1c2(cursor, 100, CL) File "/home/tahooie/cassandra-dtest/tools.py", line 28, in query_c1c2 cursor.execute('SELECT c1, c2 FROM cf USING CONSISTENCY %s WHERE key=k%d' % (consistency, key)) File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute raise cql.OperationalError("Request did not complete within rpc_timeout.") OperationalError: ('Request did not complete within rpc_timeout.', 'reading failed in 2.0130 seconds.') I did notice that if I sleep 20 seconds after killing the node and before reading, that the read succeeds. This is probably because gossip has had a chance to notice that the node is down. > rpc_timeout error when reading from a cluster that just had a node die. Only > happens if gossip hasn't noticed the dead node yet. > -------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-3775 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3775 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.6 > Environment: ubuntu. Used ccm to create the cluster. > Reporter: Tyler Patterson > > Create a cluster of 3 nodes with RF=3 and CL=QUORUM. Insert some data then > kill a node (not the coordinator) and immediately try to read the data. The > read request will fail within about 2 seconds. cassandra.yaml has > rpc_timeout=10000. A failing test has been written in cassandra-dtest, branch > "read_when_node_is_down". The test can be run like this: nosetests > --nocapture read_when_node_down_test.py Here is the error from the test: > {code} > ====================================================================== > ERROR: read_when_node_down_test.TestReadWhenNodeDown.read_when_node_down_test > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest > self.test(*self.arg) > File "/home/tahooie/cassandra-dtest/read_when_node_down_test.py", line 40, > in read_when_node_down_test > query_c1c2(cursor, 100, CL) > File "/home/tahooie/cassandra-dtest/tools.py", line 28, in query_c1c2 > cursor.execute('SELECT c1, c2 FROM cf USING CONSISTENCY %s WHERE key=k%d' > % (consistency, key)) > File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in > execute > raise cql.OperationalError("Request did not complete within rpc_timeout.") > OperationalError: ('Request did not complete within rpc_timeout.', 'reading > failed in 2.0130 seconds.') > {code} > I did notice that if I sleep 20 seconds after killing the node and before > reading, that the read succeeds. This is probably because gossip has had a > chance to notice that the node is down. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira