Wei Zhu created CASSANDRA-8819: ---------------------------------- Summary: LOCAL_QUORUM writes returns wrong message Key: CASSANDRA-8819 URL: https://issues.apache.org/jira/browse/CASSANDRA-8819 Project: Cassandra Issue Type: Bug Components: Core Environment: CentOS 6.6 Reporter: Wei Zhu Fix For: 2.0.8
We have two DC3, each with 7 nodes. Here is the keyspace setup: create keyspace test with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {DC2 : 3, DC1 : 3} and durable_writes = true; We brought down two nodes in DC2 for maintenance. We only write to DC1 using local_quroum (using datastax JavaClient) But we see this errors in the log: Cassandra timeout during write query at consistency LOCAL_QUORUM (4 replica were required but only 3 acknowledged the write why does it say 4 replica were required? and Why would it give error back to client since local_quorum should succeed. Here are the output from nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.2.0.1 10.92 GB 256 7.9% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC206 UN 10.2.0.2 6.17 GB 256 8.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC106 UN 10.2.0.3 6.63 GB 256 7.3% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107 DL 10.2.0.4 1.54 GB 256 7.7% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107 UN 10.2.0.5 6.02 GB 256 6.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC106 UJ 10.2.0.6 3.68 GB 256 ? XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205 UN 10.2.0.7 7.22 GB 256 7.7% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205 Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.1.0.1 6.04 GB 256 8.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10 UN 10.1.0.2 7.55 GB 256 7.4% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC8 UN 10.1.0.3 5.83 GB 256 7.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9 UN 10.1.0.4 7.34 GB 256 7.9% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC6 UN 10.1.0.5 7.57 GB 256 8.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC7 UN 10.1.0.6 5.31 GB 256 7.3% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10 UN 10.1.0.7 5.47 GB 256 8.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9 I did a cql trace on the query and here is the trace, and it does say Write timeout; received 3 of 4 required replies | 17:27:52,831 | 10.1.0.1 | 2002873 at the end. I guess that is where the client gets the error from. Tracing session: 5a789fb0-b70d-11e4-8fca-99bff9c19890 activity | timestamp | source | source_elapsed ---------------------------------------------------------------------------------------------------------------------------------------------+--------------+-------------+---------------- execute_cql3_query | 17:27:50,828 | 10.1.0.1 | 0 Parsing insert into test (user_id, created, event_data, event_id)values ( 123456789 , 9eab8950-b70c-11e4-8fca-99bff9c19891, 'test', '16'); | 17:27:50,828 | 10.1.0.1 | 39 Preparing statement | 17:27:50,828 | 10.1.0.1 | 135 Message received from /10.1.0.1 | 17:27:50,829 | 10.1.0.5 | 25 Sending message to /10.1.0.5 | 17:27:50,829 | 10.1.0.1 | 421 Executing single-partition query on users | 17:27:50,829 | 10.1.0.5 | 177 Acquiring sstable references | 17:27:50,829 | 10.1.0.5 | 191 Merging memtable tombstones | 17:27:50,830 | 10.1.0.5 | 208 Message received from /10.1.0.5 | 17:27:50,830 | 10.1.0.1 | 1461 Message received from /10.1.0.1 | 17:27:50,830 | 10.1.0.2 | 14 Key cache hit for sstable 1 | 17:27:50,830 | 10.1.0.5 | 254 Processing response from /10.1.0.5 | 17:27:50,830 | 10.1.0.1 | 1514 Executing single-partition query on users | 17:27:50,830 | 10.1.0.2 | 78 Seeking to partition beginning in data file | 17:27:50,830 | 10.1.0.5 | 264 Sending message to /10.1.0.2 | 17:27:50,830 | 10.1.0.1 | 1517 Acquiring sstable references | 17:27:50,830 | 10.1.0.2 | 85 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:27:50,830 | 10.1.0.5 | 453 Determining replicas for mutation | 17:27:50,830 | 10.1.0.1 | 1746 Merging memtable tombstones | 17:27:50,830 | 10.1.0.2 | 97 Merging data from memtables and 1 sstables | 17:27:50,830 | 10.1.0.5 | 476 Message received from /10.1.0.2 | 17:27:50,830 | 10.1.0.1 | 2369 Key cache hit for sstable 2 | 17:27:50,830 | 10.1.0.2 | 120 Read 1 live and 0 tombstoned cells | 17:27:50,830 | 10.1.0.5 | 506 Seeking to partition beginning in data file | 17:27:50,830 | 10.1.0.2 | 123 Enqueuing response to /10.1.0.1 | 17:27:50,830 | 10.1.0.5 | 570 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:27:50,830 | 10.1.0.2 | 288 Sending message to /10.1.0.1 | 17:27:50,830 | 10.1.0.5 | 617 Merging data from memtables and 1 sstables | 17:27:50,830 | 10.1.0.2 | 297 Read 1 live and 0 tombstoned cells | 17:27:50,830 | 10.1.0.2 | 319 Enqueuing response to /10.1.0.1 | 17:27:50,830 | 10.1.0.2 | 362 Sending message to /10.1.0.1 | 17:27:50,830 | 10.1.0.2 | 420 Message received from /10.1.0.1 | 17:27:50,831 | 10.1.0.4 | 17 Sending message to /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 2435 Message received from /10.1.0.1 | 17:27:50,831 | 10.1.0.2 | 8 Acquiring switchLock read lock | 17:27:50,831 | 10.1.0.4 | 61 Processing response from /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 2488 Acquiring switchLock read lock | 17:27:50,831 | 10.1.0.2 | 44 Appending to commitlog | 17:27:50,831 | 10.1.0.4 | 78 Not hinting /10.2.0.4 which has been down 364809650ms | 17:27:50,831 | 10.1.0.1 | 2503 Appending to commitlog | 17:27:50,831 | 10.1.0.2 | 62 Adding to event memtable | 17:27:50,831 | 10.1.0.4 | 96 Sending message to /10.1.0.4 | 17:27:50,831 | 10.1.0.1 | 2557 Adding to event memtable | 17:27:50,831 | 10.1.0.2 | 80 Enqueuing response to /10.1.0.1 | 17:27:50,831 | 10.1.0.4 | 177 Acquiring switchLock read lock | 17:27:50,831 | 10.1.0.1 | 2669 Enqueuing response to /10.1.0.1 | 17:27:50,831 | 10.1.0.2 | 160 Sending message to /10.1.0.1 | 17:27:50,831 | 10.1.0.4 | 333 Appending to commitlog | 17:27:50,831 | 10.1.0.1 | 2682 Sending message to /10.1.0.1 | 17:27:50,831 | 10.1.0.2 | 266 Adding to event memtable | 17:27:50,831 | 10.1.0.1 | 2720 Sending message to /10.2.0.5 | 17:27:50,831 | 10.1.0.1 | 2758 Message received from /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 2989 Processing response from /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 3024 Message received from /10.1.0.4 | 17:27:50,832 | 10.1.0.1 | 3764 Processing response from /10.1.0.4 | 17:27:50,832 | 10.1.0.1 | 3805 Message received from /10.1.0.1 | 17:27:50,841 | 10.2.0.5 | 24 Enqueuing forwarded write to /10.2.0.7 | 17:27:50,841 | 10.2.0.5 | 255 Enqueuing forwarded write to /10.2.0.3 | 17:27:50,841 | 10.2.0.5 | 283 Enqueuing forwarded write to /10.2.0.6 | 17:27:50,841 | 10.2.0.5 | 307 Acquiring switchLock read lock | 17:27:50,841 | 10.2.0.5 | 362 Appending to commitlog | 17:27:50,841 | 10.2.0.5 | 380 Sending message to /10.2.0.7 | 17:27:50,841 | 10.2.0.5 | 382 Adding to event memtable | 17:27:50,841 | 10.2.0.5 | 411 Sending message to /10.2.0.6 | 17:27:50,841 | 10.2.0.5 | 429 Enqueuing response to /10.1.0.1 | 17:27:50,841 | 10.2.0.5 | 484 Sending message to /10.2.0.3 | 17:27:50,841 | 10.2.0.5 | 561 Sending message to /10.1.0.1 | 17:27:50,841 | 10.2.0.5 | 625 Acquiring switchLock read lock | 17:27:50,842 | 10.2.0.6 | 1031 Acquiring switchLock read lock | 17:27:50,842 | 10.2.0.7 | 178 Appending to commitlog | 17:27:50,842 | 10.2.0.6 | 1066 Appending to commitlog | 17:27:50,842 | 10.2.0.7 | 196 Adding to event memtable | 17:27:50,842 | 10.2.0.6 | 1118 Adding to event memtable | 17:27:50,842 | 10.2.0.7 | 231 Enqueuing response to /10.1.0.1 | 17:27:50,842 | 10.2.0.6 | 1181 Enqueuing response to /10.1.0.1 | 17:27:50,842 | 10.2.0.7 | 286 Sending message to /10.1.0.1 | 17:27:50,842 | 10.2.0.7 | 421 Acquiring switchLock read lock | 17:27:50,843 | 10.2.0.3 | 1216 Sending message to /10.1.0.1 | 17:27:50,843 | 10.2.0.6 | 1382 Appending to commitlog | 17:27:50,843 | 10.2.0.3 | 1239 Adding to event memtable | 17:27:50,843 | 10.2.0.3 | 1313 Enqueuing response to /10.1.0.1 | 17:27:50,843 | 10.2.0.3 | 1407 Sending message to /10.1.0.1 | 17:27:50,843 | 10.2.0.3 | 1631 Message received from /10.2.0.5 | 17:27:50,851 | 10.1.0.1 | 23333 Processing response from /10.2.0.5 | 17:27:50,851 | 10.1.0.1 | 23380 Message received from /10.2.0.7 | 17:27:50,852 | 10.1.0.1 | 23908 Processing response from /10.2.0.7 | 17:27:50,852 | 10.1.0.1 | 23953 Message received from /10.2.0.6 | 17:27:50,853 | 10.1.0.1 | 25143 Processing response from /10.2.0.6 | 17:27:50,853 | 10.1.0.1 | 25178 Message received from /10.2.0.3 | 17:27:50,854 | 10.1.0.1 | 25478 Processing response from /10.2.0.3 | 17:27:50,854 | 10.1.0.1 | 25516 Write timeout; received 3 of 4 required replies | 17:27:52,831 | 10.1.0.1 | 2002873 Request complete | 17:27:52,833 | 10.1.0.1 | 2005989 -- This message was sent by Atlassian JIRA (v6.3.4#6332)