[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write

Aleksandr Sorokoumov (Jira) Sat, 09 Oct 2021 05:09:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426575#comment-17426575
 ]


Aleksandr Sorokoumov commented on CASSANDRA-16334:
--------------------------------------------------

I have described the root cause in the previous comment. Two distinct bugs make 
replica failures appear as timeouts, one for DC- local and -global consistency 
levels. Fixing the latter also resolves the "zombie-hint" issue I described at 
the end of the previous message.

The reason replica failure appears as a timeout in DC-local consistency level 
is that {{AbstractWriteResponseHandler}} counts nodes in all DCs as potential 
candidates to wait for. The fix is to wait only for the DC-local nodes.

The second bug that is responsible both for the "zombie-hints" and the timeout 
issue with global consistency levels is related to forwarding replica failures 
to the correct address. This patch makes replicas send request failures to the 
original coordinator rather than the DC-local one that forwarded them the 
message. Besides, in 3.0 and 3.11, I also added missing respond-on-failure flag 
to the forwarded messages.

Patches:

* [dtest|https://github.com/apache/cassandra-dtest/pull/165]
* [3.0|https://github.com/apache/cassandra/pull/1259]
* [3.11|https://github.com/apache/cassandra/pull/1260]
* [4.0|https://github.com/apache/cassandra/pull/1261]
* [trunk|https://github.com/apache/cassandra/pull/1262]

[~paulo] Can you please start the CI?

> Replica failure causes timeout on multi-DC write
> ------------------------------------------------
>
>                 Key: CASSANDRA-16334
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16334
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Messaging/Internode
>            Reporter: Paulo Motta
>            Assignee: Aleksandr Sorokoumov
>            Priority: Normal
>
> Inserting a mutation larger than {{max_mutation_size_in_kb}} correctly throws 
> a write error on a single DC keyspace with RF=3:
> {noformat}
> cassandra.WriteFailure: Error from server: code=1500 [Replica(s) failed to 
> execute write] message="Operation failed - received 0 responses and 3 
> failures: UNKNOWN from /127.0.0.3:7000, UNKNOWN from /127.0.0.2:7000, UNKNOWN 
> from /127.0.0.1:7000" info={'consistency': 'LOCAL_ONE', 'required_responses': 
> 1, 'received_responses': 0, 'failures': 3}
> {noformat}
> The same insert wrongly causes a timeout on a keyspace with 2 dcs (RF=3 each):
> {noformat}
> cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed 
> out waiting for replica nodes' responses] message="Operation timed out - 
> received only 0 responses." info={'consistency': 'LOCAL_ONE', 
> 'required_responses': 1, 'received_responses': 0}
> {noformat}
> Reproduction steps:
> {noformat}
> # Setup cluster
> ccm create -n 3:3 test
> for i in {1..6}; do echo 'max_mutation_size_in_kb: 1000' >> 
> ~/.ccm/test/node$i/conf/cassandra.yaml; done
> ccm start
> # Create schema
> ccm node1 cqlsh
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3};
> CREATE TABLE test.test (key int PRIMARY KEY, val blob);
> exit;
> # Insert data
> python
> from cassandra.cluster import Cluster
> cluster = Cluster()
> session = cluster.connect('test')
> blob = f = open("2mbBlob", "rb").read().hex()
> session.execute("INSERT INTO test (key, val) VALUES (1, textAsBlob('" + blob 
> + "'))")
> {noformat}
> Reproduced in 3.0, 3.11, 4.0, trunk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16334) Replica failure causes timeout on multi-DC write

Reply via email to